3.4 Higher Order Functions & SQL UDFs

A higher order function is a function that takes one or more functions as arguments, or returns a function as its result In Databrick, higher order function allow you to work directly with hierarchical data like arrays and map type objects.

Two useful Higher Order Functions

filter()

filter array by a given lambda function (i just refers to books)

--example books columns: [{"book_id":"B09","quantity":2,"subtotal":48}]
--filter the books column where quantity>=2:

SELECT

  order_id,
  books,
  FILTER (books, i -> i.quantity >= 2
	) AS multiple_copies

FROM orders

TRANSFORM()

transform array by a given lambda function

SELECT

  order_id,

  books,

  TRANSFORM (

    books, b -> CAST(b.subtotal * 0.8 AS INT)

  ) AS subtotal_after_discount

FROM orders;

User-defined functions

allow you to register a custom combination of SQL logic as function, which can be reused
leverage spark SQL directly & maintain all optimization
UDFs are permanent objects that are persisted to the database (ok for different session/ notebooks)

-- 3 parts needed: function name, returned type and logic

CREATE OR REPLACE FUNCTION get_url(email STRING)
RETURNS STRING
RETURN concat("https://www.", split(email, "@")[1])

see details of the UDFs by:

DESCRIBE FUNCTION EXTENDED get_url

drop function by:

DROP FUNCTION get_url;

Stanley Chan's Note🧠

Explorer

3.4 Higher Order Functions & SQL UDFs

Two useful Higher Order Functions

filter()

User-defined functions

Graph View

Table of Contents

Backlinks