Unity Catalog

  • Centralized governance solution across all your workspaces on any cloud.
  • Unify governance for all data and AI assets
    • files, tables, machine learning models, dashboards
    • based on SQL

  • set configuration and rule once for all workspaces
  • metatstore separated out of workspace and managed through account console, which can be assigned to workspace(s)

UC metastore != Hive metastore

Hive metastore only tides with the workspace. UC metastore offer improved security and advanced features on top level.

UC added the 3rd names space:

//from
SELECT * FROM schema.table
//to
SELECT * FROM catalog.schema.table

3 types of identities:

  • Users: identified by e-mail addresses
    • Account administrator
  • Service Principles: identified by Application IDs
    • Service Principles with administrative privilege
  • Groups: grouping Users & Service Principles
    • group can be nested with other groups
    • e.g. HR + Finance group within Employees group

Identify Federation

  • account can be created once in console, then assigned to workspace
  • therefore, no need to do individual configuration for each workspace
  1. at account level
  2. at workspace level

Privileges in UC:

  • CREATE
  • USAGE
  • SELECT
  • MODIFY
  • READ FILES
  • WRITE FILES
  • EXECUTE (allow executing user defined functions)

Accessing legacy Hive megastore

  • Metastore can be accessed even the unity catalog is enabled
  • Metastore still exist to the individual workspace

Features of Catalog:

  • Centralized governance for data and AI
  • Built-in data search and discovery
  • Automated lineage
  • No hard migration required when being enable