overwriting
Advantage of overwriting files, instead of new creation:
- ACID compliant
- allow time travel
- faster
M1 CREATE OR REPLACE TABLE (whole table changed)
CREATE OR REPLACE TABLE orders AS
SELECT * FROM parquet.`${dataset.bookstore}/orders`
M2 INSERT OVERWRITE (data replaced by the data inserted) (schema must be matched)
INSERT OVERWRITE orders
SELECT * FROM parquet.`${dataset.bookstore}/orders`
Appending:
INSERT INTO orders
SELECT * FROM parquet.`${dataset.bookstore}/orders-new`
Merging Data/ UPSERT (won’t insert dumplicated row):
MERGE INTO customers c
USING customers_updates u
ON c.customer_id = u.customer_id
WHEN MATCHED AND c.email IS NULL AND u.email IS NOT NULL THEN
UPDATE SET email = u.email, updated = u.updated
WHEN NOT MATCHED THEN INSERT *
--updated email when the exisitngs id has a null email, insert if that id is not exist