728x90
반응형
월별로 들어오는 데이터를 적재하는 방법에서 APPEND 사용
데이터 양이 크지않아서 coalesce(1) 을 줌
import json
from pyspark import SparkContext, SQLContext
from pyspark.sql import SparkSession
from pyspark.sql.types import StructType, StructField, StringType,IntegerType
import os
#java_home
os.environ['JAVA_HOME'] = '/home/java/jdk1.8.0_301'
columns = ['amount', 'id']
spark = SparkSession.builder.getOrCreate()
vals = [(111, 1), (222, 2)]
df1 = spark.createDataFrame(vals, columns)
print(type(df1))
df1.printSchema()
print("saveorigin")
df1.write.mode("overwrite").parquet("./tmp1/test2.parquet")
print("read origin")
df_read = spark.read.parquet("./tmp1/test2.parquet")
df_read.show()
print("append")
df2 = spark.createDataFrame([(444,4)], columns)
df2.show()
df2.write.mode("append").parquet("./tmp1/test2.parquet")
print("show df_read2")
df_read2 = spark.read.parquet("./tmp1/test2.parquet")
df_read2.show()
## 결과 ##
<class 'pyspark.sql.dataframe.DataFrame'>
root
|-- amount: long (nullable = true)
|-- id: long (nullable = true)
saveorigin
read origin
+------+---+
|amount| id|
+------+---+
| 222| 2|
| 111| 1|
+------+---+
append
+------+---+
|amount| id|
+------+---+
| 444| 4|
+------+---+
show df_read2
+------+---+
|amount| id|
+------+---+
| 222| 2|
| 111| 1|
| 444| 4|
+------+---+
728x90
반응형
'기타 > Python' 카테고리의 다른 글
Python) code 내에서 변수 초기화 (0) | 2022.11.22 |
---|---|
Python) parquet upsert with delta table (0) | 2021.10.19 |
Python) pyspark dataframe overwrite (0) | 2021.10.12 |