Without clustering, Snowflake does not actively optimize the distribution of data across Snowflake - Micro-Partition (MPs) — it’s essentially “append-only in the order data arrives.”
Clustering change the distribution of data across micro-partitions so that the data with close/ same values can collocate together within the micro-partition.
Two Important Concepts:
1️⃣ Cluster Depth
- Measures how many MPs a single key value spans on average.
- Calculated by MPs containing the key/ Total no. of MPs
- Low depth → the value is concentrated in few micro-partitions → efficient pruning.
- High depth → the value is spread across many partitions → more scanning required.
- for discrete values (ID, categories)
2️⃣ Cluster Overlap
- Measures how much the key value ranges of micro-partitions overlap.
- Calculated by Range Overlap across MPs/ Total Range
- Low overlap → each micro-partition covers a tight, distinct range of values → pruning works well.
- High overlap → partitions cover overlapping key ranges → queries read more MPs than necessary.
- for ordered (date, numeric)
To view/monitor the clustering metadata for a table, Snowflake provides the following system functions:
- SYSTEM$CLUSTERING_DEPTH
- SYSTEM$CLUSTERING_INFORMATION (including clustering depth)