overwriting

Advantage of overwriting files, instead of new creation:

  • ACID compliant
  • allow time travel
  • faster

M1 CREATE OR REPLACE TABLE (whole table changed)

CREATE OR REPLACE TABLE orders AS
SELECT * FROM parquet.`${dataset.bookstore}/orders`

M2 INSERT OVERWRITE (data replaced by the data inserted) (schema must be matched)

INSERT OVERWRITE orders
SELECT * FROM parquet.`${dataset.bookstore}/orders`

Appending:

INSERT INTO orders
SELECT * FROM parquet.`${dataset.bookstore}/orders-new`

Merging Data/ UPSERT (won’t insert dumplicated row):

MERGE INTO customers c

USING customers_updates u

ON c.customer_id = u.customer_id

WHEN MATCHED AND c.email IS NULL AND u.email IS NOT NULL THEN
  UPDATE SET email = u.email, updated = u.updated
WHEN NOT MATCHED THEN INSERT *

--updated email when the exisitngs id has a null email, insert if that id is not exist