Using Spark SQL

The most common way to work with data in delta tables in Spark is to use Spark SQL

spark.sql("INSERT INTO products VALUES (1, 'Widget', 'Accessories', 2.99)")

Alternatively, you can use the `sql UPDATE products SET Price = 2.49 WHERE ProductId = 1;


## Use the Delta API
 work with delta files rather than catalog tables, it may be simpler to use the Delta Lake API

from delta.tables import * from pyspark.sql.functions import *

Create a DeltaTable object

delta_path = “Files/mytable” deltaTable = DeltaTable.forPath(spark, delta_path)

Update the table (reduce price of accessories by 10%)

deltaTable.update( condition = “Category == ‘Accessories’”, set = { “Price”: “Price * 0.9” })

## Use _time travel_ to work with table versioning
Modifications made to delta tables are logged in the transaction log for the table. You can use the logged transactions to view the history of changes made to the table and to retrieve older versions of the data (known as _time travel_)
To see the history of a table, you can use the `DESCRIBE` SQL command as shown here.

%%sql DESCRIBE HISTORY products


retrieve data from a specific version of the data by specifying the version

df = spark.read.format(“delta”).option(“versionAsOf”, 0).load(delta_path)


OR retrieve data from a specific version of the data by specifying the timestamp

df = spark.read.format(“delta”).option(“timestampAsOf”, ‘2022-01-01’).load(delta_path)

Stanley Chan's Note🧠

Explorer

Work with delta tables in Spark

Using Spark SQL

Create a DeltaTable object

Update the table (reduce price of accessories by 10%)

Graph View

Table of Contents

Backlinks