Data/Trino
Kubernetes)Trino설치_yaml
MightyTedKim
2022. 2. 25. 16:58
728x90
반응형
spark thrift server는 관리하기 힘들어서, Trino 공부
thrift 세팅하신 분들은 metastore, mysql 그대로 사용하면 되니까 금방 hello world 할 수 있을거에요
https://github.com/joshuarobinson/trino-on-k8s
https://joshua-robinson.medium.com/presto-powered-s3-data-warehouse-on-kubernetes-aea89d2f40e8
결과
$ k get all -n trino NAME READY STATUS RESTARTS AGE pod/trino-cli 1/1 Running 0 35d pod/trino-coordinator-574c748c86-j56pt 1/1 Running 0 35d pod/trino-worker-0 1/1 Running 0 35d pod/trino-worker-1 1/1 Running 0 35d pod/trino-worker-2 1/1 Running 0 35d NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/trino ClusterIP 10.233.8.251 <none> 8080/TCP 35d NAME READY UP-TO-DATE AVAILABLE AGE deployment.apps/trino-coordinator 1/1 1 1 35d NAME DESIRED CURRENT READY AGE replicaset.apps/trino-coordinator-574c748c86 1 1 1 35d NAME READY AGE statefulset.apps/trino-worker 3/3 35d |
파일
파일도 4개만 필요해요
- s3-secret.yaml
- trino-cfgs.yaml
- trino.yaml
- trino-svc-np.yaml
순서대로 시작하면됨. 나는 아래 명령어를 실행함
secret 만들고
$ cat s3-secret.yaml
apiVersion: v1
kind: Secret
metadata:
name: my-s3-keys
namespace: trino
type: Opaque
data:
access-key: ****==
secret-key: ****==
|
coordinator가 곧 jdb이자 ui라서 np로 빼주고
$ cat trino-svc-np.yaml
apiVersion: v1
kind: Service
metadata:
name: trino
namespace: trino
spec:
type: NodePort
selector:
app: trino-coordinator
ports:
- name: trino-ui
nodePort: 30099
port: 8080
|
configmap 만들어주고
$ cat trino-cfgs.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
name: trino-configs
data:
jvm.config: |-
-server
-Xmx16G
-XX:-UseBiasedLocking
-XX:+UseG1GC
-XX:G1HeapRegionSize=32M
-XX:+ExplicitGCInvokesConcurrent
-XX:+ExitOnOutOfMemoryError
-XX:+UseGCOverheadLimit
-XX:+HeapDumpOnOutOfMemoryError
-XX:ReservedCodeCacheSize=512M
-Djdk.attach.allowAttachSelf=true
-Djdk.nio.maxCachedBufferSize=2000000
config.properties.coordinator: |-
coordinator=true
node-scheduler.include-coordinator=false
http-server.http.port=8080
query.max-memory=200GB
query.max-memory-per-node=8GB
query.max-total-memory-per-node=10GB
query.max-stage-count=200
task.writer-count=4
discovery-server.enabled=true
discovery.uri=http://trino:8080
config.properties.worker: |-
coordinator=false
http-server.http.port=8080
query.max-memory=200GB
query.max-memory-per-node=10GB
query.max-total-memory-per-node=10GB
query.max-stage-count=200
task.writer-count=4
discovery.uri=http://trino:8080
node.properties: |-
node.environment=test
spiller-spill-path=/tmp
max-spill-per-node=4TB
query-max-spill-per-node=1TB
hive.properties: |-
connector.name=hive-hadoop2
#hive.metastore.uri=thrift://metastore:9083
hive.metastore.uri=thrift://10.100.210.35:9083
hive.allow-drop-table=true
hive.max-partitions-per-scan=1000000
#hive.s3.endpoint=10.62.64.200
hive.s3.endpoint=10.106.47.55
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
hive.s3.max-connections=100
iceberg.properties: |-
connector.name=iceberg
hive.metastore.uri=thrift://metastore:9083
hive.max-partitions-per-scan=1000000
hive.s3.endpoint=10.62.64.200
hive.s3.path-style-access=true
hive.s3.ssl.enabled=false
hive.s3.max-connections=100
mysql.properties: |-
connector.name=mysql
#connection-url=jdbc:mysql://metastore-db.default.svc.cluster.local:13306
connection-url=jdbc:mysql://10.106.121.238:3306
#connection-user=root
connection-user=root
#connection-password=mypass
connection-password=hgkimpwd
|
마지막으로 trino를 실행해줘요. 저는 테스트용도라 리소스 낮게 잡앗어요
(요거는 티스토리 에러나서 코드블럭으로)
$ cat trino.yaml
---
apiVersion: v1
kind: Service
metadata:
name: trino
namespace: trino
spec:
ports:
- port: 8080
selector:
app: trino-coordinator
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: trino-coordinator
namespace: trino
spec:
selector:
matchLabels:
app: trino-coordinator
strategy:
type: Recreate
template:
metadata:
labels:
app: trino-coordinator
spec:
containers:
- name: trino
#image: trinodb/trino:355
image: hgkim.repo/library/trinodb/trino:365
ports:
- containerPort: 8080
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: my-s3-keys
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: my-s3-keys
key: secret-key
volumeMounts:
- name: trino-cfg-vol
mountPath: /etc/trino/jvm.config
subPath: jvm.config
- name: trino-cfg-vol
mountPath: /etc/trino/config.properties
subPath: config.properties.coordinator
- name: trino-cfg-vol
mountPath: /etc/trino/node.properties
subPath: node.properties
- name: trino-cfg-vol
mountPath: /etc/trino/catalog/hive.properties
subPath: hive.properties
- name: trino-cfg-vol
mountPath: /etc/trino/catalog/iceberg.properties
subPath: iceberg.properties
- name: trino-cfg-vol
mountPath: /etc/trino/catalog/mysql.properties
subPath: mysql.properties
resources:
requests:
memory: "4G"
#memory: "16G"
#cpu: 4
cpu: 1
imagePullPolicy: Always
volumes:
- name: trino-cfg-vol
configMap:
name: trino-configs
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: trino-worker
namespace: trino
spec:
serviceName: trino-worker
#replicas: 10
replicas: 2
selector:
matchLabels:
app: trino-worker
template:
metadata:
labels:
app: trino-worker
spec:
securityContext:
fsGroup: 1000
containers:
- name: trino
#image: trinodb/trino:355
image: hgkim.repo/library/trinodb/trino:365
ports:
- containerPort: 8080
env:
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
name: my-s3-keys
key: access-key
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
name: my-s3-keys
key: secret-key
volumeMounts:
- name: trino-cfg-vol
mountPath: /etc/trino/jvm.config
subPath: jvm.config
- name: trino-cfg-vol
mountPath: /etc/trino/config.properties
subPath: config.properties.worker
- name: trino-cfg-vol
mountPath: /etc/trino/node.properties
subPath: node.properties
- name: trino-cfg-vol
mountPath: /etc/trino/catalog/hive.properties
subPath: hive.properties
- name: trino-cfg-vol
mountPath: /etc/trino/catalog/iceberg.properties
subPath: iceberg.properties
- name: trino-cfg-vol
mountPath: /etc/trino/catalog/mysql.properties
subPath: mysql.properties
- name: trino-tmp-data
mountPath: /tmp
resources:
requests:
#memory: "64G"
memory: "1G"
#cpu: 12
cpu: 1
imagePullPolicy: Always
volumes:
- name: trino-cfg-vol
configMap:
name: trino-configs
volumeClaimTemplates:
- metadata:
name: trino-tmp-data
spec:
#storageClassName: pure-block
storageClassName: rook-ceph-block
accessModes:
- ReadWriteOnce
resources:
requests:
#storage: 8Ti
storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
name: trino-cli
namespace: trino
spec:
containers:
- name: trino-cli
image: hgkim.repo/library/trinodb/trino:365
command: ["tail", "-f", "/dev/null"]
imagePullPolicy: Always
restartPolicy: Always
테스트
CLI
ter:]$ kubectl exec -it trino-cli -n trino --catalog hive --schema default trino:default> show tables; Table ------- (0 rows) Query 20220225_070005_00003_a68ud, FINISHED, 3 nodes Splits: 12 total, 12 done (100.00%) 0.23 [0 rows, 0B] [0 rows/s, 0B/s] |
dbeaver
jdbcurl : jdbc:trino://10.240.35.32:30024/hive/default host : 10.240.35.32 port : 30024 database : hive/default username : admin jdbc : https://repo1.maven.org/maven2/io/trino/trino-jdbc/371/trino-jdbc-371.jar |
sts 테스트하다가 만들었던 테이블 구조 그대로 가져와서 써서 편해요
metastore, mysql을 공유해서 사용하니까요
기타
worker 별로 pvc도 만들어주는 친절함까지ㅜㅜ
k get pvc -n trino NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE trino-tmp-data-trino-worker-0 Bound pvc-7a50b6e0-a27c-4305-a347-39bcc509ec11 2Gi RWO rook-ceph-block 89m trino-tmp-data-trino-worker-1 Bound pvc-85778341-9e29-4422-b4b7-dd2ff22044d7 2Gi RWO rook-ceph-block 75m |
728x90
반응형