Stanley Chan's Note🧠

Search

❯

❯

❯

1.1 Databrick intro

1.1 Databrick intro

Aug 16, 20251 min read

Two high level plane:

Control Plane (databricks account)
- web ui
- Cluster management
- workflows
- Notebooks
Data plane (your own cloud account)
- cluster VMs for compute
- Storage

Spark on Databricks

in-memory, distributed data processing
support Scala, python, sql, r, java
allow batch processing & stream processing
structured, semi structured and unstructured data

Databricks file system (DBFS)

distributed file system
pre-installed in Databrick clusters
abstraction layer using underlaying cloud storage (eg S3):
- File created in DBFS in the cluster will store in cloud

Compute

multi-code - cluster (compose of master node - driver, that coordinating other worker for parallel execution of task, and some other worker nodes)
single node - no workers and run spark on the driver

Graph View

Backlinks

No backlinks found

Created with Quartz v4.2.3 © 2025

GitHub
LinkedIn