'데이터 엔지니어'로 성장하기

정리하는 걸 좋아하고, 남이 읽으면 더 좋아함

Data/Trino

Kubernetes)Trino설치_yaml

MightyTedKim 2022. 2. 25. 16:58
728x90
반응형

spark thrift server는 관리하기 힘들어서, Trino 공부

 


 

thrift 세팅하신 분들은 metastore, mysql 그대로 사용하면 되니까 금방 hello world 할 수 있을거에요

 

https://github.com/joshuarobinson/trino-on-k8s

https://joshua-robinson.medium.com/presto-powered-s3-data-warehouse-on-kubernetes-aea89d2f40e8

결과

$ k get all -n trino
NAME                                     READY   STATUS    RESTARTS   AGE
pod/trino-cli                            1/1     Running   0          35d
pod/trino-coordinator-574c748c86-j56pt   1/1     Running   0          35d
pod/trino-worker-0                       1/1     Running   0          35d
pod/trino-worker-1                       1/1     Running   0          35d
pod/trino-worker-2                       1/1     Running   0          35d

NAME            TYPE        CLUSTER-IP     EXTERNAL-IP   PORT(S)    AGE
service/trino   ClusterIP   10.233.8.251   <none>        8080/TCP   35d

NAME                                READY   UP-TO-DATE   AVAILABLE   AGE
deployment.apps/trino-coordinator   1/1     1            1           35d

NAME                                           DESIRED   CURRENT   READY   AGE
replicaset.apps/trino-coordinator-574c748c86   1         1         1       35d

NAME                            READY   AGE
statefulset.apps/trino-worker   3/3     35d

파일

파일도 4개만 필요해요

  1. s3-secret.yaml
  2. trino-cfgs.yaml 
  3. trino.yaml
  4. trino-svc-np.yaml 

 

순서대로 시작하면됨. 나는 아래 명령어를 실행함

secret 만들고

$ cat s3-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: my-s3-keys
  namespace: trino
type: Opaque
data:
  access-key: ****==
  secret-key: ****==
coordinator가 곧 jdb이자 ui라서 np로 빼주고
$ cat trino-svc-np.yaml
apiVersion: v1
kind: Service
metadata:
  name: trino
  namespace: trino
spec:
  type: NodePort
  selector:
    app: trino-coordinator
  ports:
  - name: trino-ui
    nodePort: 30099
    port: 8080

configmap 만들어주고

$ cat trino-cfgs.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: trino-configs
data:
  jvm.config: |-
    -server
    -Xmx16G
    -XX:-UseBiasedLocking
    -XX:+UseG1GC
    -XX:G1HeapRegionSize=32M
    -XX:+ExplicitGCInvokesConcurrent
    -XX:+ExitOnOutOfMemoryError
    -XX:+UseGCOverheadLimit
    -XX:+HeapDumpOnOutOfMemoryError
    -XX:ReservedCodeCacheSize=512M
    -Djdk.attach.allowAttachSelf=true
    -Djdk.nio.maxCachedBufferSize=2000000
  config.properties.coordinator: |-
    coordinator=true
    node-scheduler.include-coordinator=false
    http-server.http.port=8080
    query.max-memory=200GB
    query.max-memory-per-node=8GB
    query.max-total-memory-per-node=10GB
    query.max-stage-count=200
    task.writer-count=4
    discovery-server.enabled=true
    discovery.uri=http://trino:8080
  config.properties.worker: |-
    coordinator=false
    http-server.http.port=8080
    query.max-memory=200GB
    query.max-memory-per-node=10GB
    query.max-total-memory-per-node=10GB
    query.max-stage-count=200
    task.writer-count=4
    discovery.uri=http://trino:8080
  node.properties: |-
    node.environment=test
    spiller-spill-path=/tmp
    max-spill-per-node=4TB
    query-max-spill-per-node=1TB
  hive.properties: |-
    connector.name=hive-hadoop2
    #hive.metastore.uri=thrift://metastore:9083
    hive.metastore.uri=thrift://10.100.210.35:9083
    hive.allow-drop-table=true
    hive.max-partitions-per-scan=1000000
    #hive.s3.endpoint=10.62.64.200
    hive.s3.endpoint=10.106.47.55
    hive.s3.path-style-access=true
    hive.s3.ssl.enabled=false
    hive.s3.max-connections=100
  iceberg.properties: |-
    connector.name=iceberg
    hive.metastore.uri=thrift://metastore:9083
    hive.max-partitions-per-scan=1000000
    hive.s3.endpoint=10.62.64.200
    hive.s3.path-style-access=true
    hive.s3.ssl.enabled=false
    hive.s3.max-connections=100
  mysql.properties: |-
    connector.name=mysql
    #connection-url=jdbc:mysql://metastore-db.default.svc.cluster.local:13306
    connection-url=jdbc:mysql://10.106.121.238:3306
    #connection-user=root
    connection-user=root
    #connection-password=mypass
    connection-password=hgkimpwd

마지막으로 trino를 실행해줘요. 저는 테스트용도라 리소스 낮게 잡앗어요

(요거는 티스토리 에러나서 코드블럭으로)
$ cat trino.yaml
---
apiVersion: v1
kind: Service
metadata:
  name: trino
  namespace: trino
spec:
  ports:
  - port: 8080
  selector:
    app: trino-coordinator
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: trino-coordinator
  namespace: trino
spec:
  selector:
    matchLabels:
      app: trino-coordinator
  strategy:
    type: Recreate
  template:
    metadata:
      labels:
        app: trino-coordinator
    spec:
      containers:
      - name: trino
        #image: trinodb/trino:355
        image: hgkim.repo/library/trinodb/trino:365
        ports:
        - containerPort: 8080
        env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: my-s3-keys
              key: access-key
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: my-s3-keys
              key: secret-key
        volumeMounts:
        - name: trino-cfg-vol
          mountPath: /etc/trino/jvm.config
          subPath: jvm.config
        - name: trino-cfg-vol
          mountPath: /etc/trino/config.properties
          subPath: config.properties.coordinator
        - name: trino-cfg-vol
          mountPath: /etc/trino/node.properties
          subPath: node.properties
        - name: trino-cfg-vol
          mountPath: /etc/trino/catalog/hive.properties
          subPath: hive.properties
        - name: trino-cfg-vol
          mountPath: /etc/trino/catalog/iceberg.properties
          subPath: iceberg.properties
        - name: trino-cfg-vol
          mountPath: /etc/trino/catalog/mysql.properties
          subPath: mysql.properties
        resources:
          requests:
            memory: "4G"
            #memory: "16G"
            #cpu: 4
            cpu: 1
        imagePullPolicy: Always
      volumes:
        - name: trino-cfg-vol
          configMap:
            name: trino-configs
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: trino-worker
  namespace: trino
spec:
  serviceName: trino-worker
  #replicas: 10
  replicas: 2
  selector:
    matchLabels:
      app: trino-worker
  template:
    metadata:
      labels:
        app: trino-worker
    spec:
      securityContext:
        fsGroup: 1000
      containers:
      - name: trino
        #image: trinodb/trino:355
        image: hgkim.repo/library/trinodb/trino:365
        ports:
        - containerPort: 8080
        env:
        - name: AWS_ACCESS_KEY_ID
          valueFrom:
            secretKeyRef:
              name: my-s3-keys
              key: access-key
        - name: AWS_SECRET_ACCESS_KEY
          valueFrom:
            secretKeyRef:
              name: my-s3-keys
              key: secret-key
        volumeMounts:
        - name: trino-cfg-vol
          mountPath: /etc/trino/jvm.config
          subPath: jvm.config
        - name: trino-cfg-vol
          mountPath: /etc/trino/config.properties
          subPath: config.properties.worker
        - name: trino-cfg-vol
          mountPath: /etc/trino/node.properties
          subPath: node.properties
        - name: trino-cfg-vol
          mountPath: /etc/trino/catalog/hive.properties
          subPath: hive.properties
        - name: trino-cfg-vol
          mountPath: /etc/trino/catalog/iceberg.properties
          subPath: iceberg.properties
        - name: trino-cfg-vol
          mountPath: /etc/trino/catalog/mysql.properties
          subPath: mysql.properties
        - name: trino-tmp-data
          mountPath: /tmp
        resources:
          requests:
            #memory: "64G"
            memory: "1G"
            #cpu: 12
            cpu: 1
        imagePullPolicy: Always
      volumes:
        - name: trino-cfg-vol
          configMap:
            name: trino-configs
  volumeClaimTemplates:
  - metadata:
      name: trino-tmp-data
    spec:
      #storageClassName: pure-block
      storageClassName: rook-ceph-block
      accessModes:
      - ReadWriteOnce
      resources:
        requests:
          #storage: 8Ti
          storage: 2Gi
---
apiVersion: v1
kind: Pod
metadata:
  name: trino-cli
  namespace: trino
spec:
  containers:
  - name: trino-cli
    image: hgkim.repo/library/trinodb/trino:365
    command: ["tail", "-f", "/dev/null"]
    imagePullPolicy: Always
  restartPolicy: Always
 

테스트

 CLI

ter:]$ kubectl exec -it trino-cli -n trino --catalog hive --schema default
trino:default> show tables;
 Table
-------
(0 rows)

Query 20220225_070005_00003_a68ud, FINISHED, 3 nodes
Splits: 12 total, 12 done (100.00%)
0.23 [0 rows, 0B] [0 rows/s, 0B/s]

dbeaver

jdbcurl : jdbc:trino://10.240.35.32:30024/hive/default
host : 10.240.35.32
port : 30024
database : hive/default
username : admin
jdbc : https://repo1.maven.org/maven2/io/trino/trino-jdbc/371/trino-jdbc-371.jar

sts 테스트하다가 만들었던 테이블 구조 그대로 가져와서 써서 편해요

metastore, mysql을 공유해서 사용하니까요

기타

worker 별로 pvc도 만들어주는 친절함까지ㅜㅜ
 k get pvc -n trino
NAME                            STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
trino-tmp-data-trino-worker-0   Bound    pvc-7a50b6e0-a27c-4305-a347-39bcc509ec11   2Gi        RWO            rook-ceph-block   89m
trino-tmp-data-trino-worker-1   Bound    pvc-85778341-9e29-4422-b4b7-dd2ff22044d7   2Gi        RWO            rook-ceph-block   75m

 

728x90
반응형