728x90
반응형
pandas dataframe만 사용하다가, overwrite가 필요해서 pyspark dataframe을 활용함
내가 필요한건 upsert인데, delta table도 살펴봐야겠음
import json
from pyspark import SparkContext, SQLContext
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType,IntegerType
import os
#java_home
os.environ['JAVA_HOME'] = '/home/java/jdk1.8.0_301'
columns = ['amount', 'id']
spark = SparkSession.builder.getOrCreate()
#df1 = spark.read.json("./orders.json")
vals = [(111, 1), (222, 2)]
df1 = spark.createDataFrame(vals, columns)
print(type(df1))
df1.printSchema()
print("show origing")
df1.show()
print("append")
newRow = spark.createDataFrame([(444,4)], columns)
newRow.show()
print("**upsert**")
df2 = df2.union(newRow)
print("show appended")
df2.show()
## 결과 ##
<class 'pyspark.sql.dataframe.DataFrame'>
root
|-- amount: long (nullable = true)
|-- id: long (nullable = true)
show origing
+------+---+
|amount| id|
+------+---+
| 111| 1|
| 222| 2|
+------+---+
append
+------+---+
|amount| id|
+------+---+
| 444| 4|
+------+---+
**upsert**
show appended
+------+---+
|amount| id|
+------+---+
| 111| 1|
| 222| 2|
| 444| 4|
+------+---+
728x90
반응형
'기타 > Python' 카테고리의 다른 글
Python) code 내에서 변수 초기화 (0) | 2022.11.22 |
---|---|
Python) parquet upsert with delta table (0) | 2021.10.19 |
Python) pyspark dataframe append (0) | 2021.10.13 |