All data in Snowflake tables is automatically divided into micro-partitions, which are contiguous units of storage. Each micro-partition contains between 50 MB and 500 MB of uncompressed data

Snowflake stores metadata about all rows stored in a micro-partition, including:

  • The range of values for each of the columns in the micro-partition
  • The number of distinct values
  • Additional properties used for both optimization and efficient query processing.

The micro-partition metadata maintained by Snowflake enables precise pruning of columns in micro-partitions at query run-time, including columns containing semi-structured data. In other words, a query that specifies a filter predicate on a range of values that accesses 10% of the values in the range should ideally only scan 10% of the micro-partitions.

Tips / Intuition

Micro-partitions are immutable, it cannot be changed once it is created. New micro-partition are added for new data loads in.

It is a not problem when new data loaded in create new partition (e.g. time series). but if the operation consists of many DML (e.g. updates/ deletes). This will affect the performance

To improve the performance of micro-partitions, we ca use Snowflake - Clustering