Databricks Code Practice

Get fluent in Databricks by typing, not watching.

104 exercises + 5 production-grade pipeline labs. All on Databricks Free Edition.

Clone once, import into Databricks, pick a folder. Exercises fail loud until your code is right; labs ship with synthetic data so you build production-style pipelines, not toy ones.

New (18 April 2026): 5 full-scale pipeline labs + 1 benchmark deep-dive just landed. If you starred this repo for the exercises, they're still here - now alongside end-to-end project work.

Author

Jakub Lasak - Databricks Data Engineer. Helping you interview like seniors, execute like seniors, and think like seniors.

LinkedIn (13.5K followers) - Databricks projects and tips
Substack - Newsletter for data engineers
DataEngineer.wiki - Cheat sheets, learning paths, cert guides

Prepping for interviews? Writing code is one half of the battle - knowing the questions that actually come up is the other. I maintain Databricks Interview Cheat Sheets by seniority level (junior / mid / senior / bundle).

What's Inside

Fluency comes from reps, not reading. Three structured paths:

exercises/ - focused reps on a single concept. LeetCode-style, 5-30 min each.
pipeline-labs/ - end-to-end medallion pipelines on a business scenario. 2-3 hours each.
deep-dives/ - measure the impact of a technique with numbers. 1-2 hours each.

	Exercises	Pipeline Labs	Deep-Dives
Format	Single notebook, one TODO per exercise	Multi-notebook guided project	Single-topic deep investigation
Time	5-30 min per exercise	2-3 hours per lab	1-2 hours
Scope	One concept (MERGE, window functions, ...)	End-to-end project (ingestion -> bronze -> silver -> gold)	One topic measured in depth
Narrative	None. "Given table X, write..."	Business scenario. "You're building a streaming pipeline for..."	Benchmark-driven. "Apply technique, measure the delta."
Order	Pick any, skip around	Sequential notebooks that build on each other	Sequential; each step layers on the last
Goal	Drill a skill until it's automatic	See how concepts fit in a real project	Prove what a technique actually buys you

Catalog

Exercises (`exercises/`)

Topic	Notebooks	Exercises	Description
Delta Lake	6	51	MERGE operations, time travel, schema enforcement, OPTIMIZE, liquid clustering, change data feed
ELT	7	53	Spark SQL joins, window functions, PySpark transformations, Auto Loader, batch ingestion, medallion architecture, complex data types

Total: 13 notebooks, 104 exercises

More exercise topics coming - next up: Streaming, Unity Catalog, Performance, and DLT.

Pipeline Labs (`pipeline-labs/`)

Multi-notebook, end-to-end medallion pipelines with a business scenario. Each runs 2-3 hours and ships with a synthetic data generator.

Lab	What You Build	Focus
Apparel Retail 360 (DLT)	End-to-end retail analytics pipeline on Delta Live Tables with a full medallion architecture.	DLT, Medallion, SCD Type 2, Streaming, Data Quality Expectations
Fintech Transaction Monitoring	Real-time fraud-monitoring pipeline for a payment processor handling 500K+ transactions/day.	Structured Streaming, Rescued Data, Watermarked Dedup, Stream-Static Joins, Liquid Clustering
DE Associate Certification Prep	Production-grade pipeline covering every exam domain of the Databricks Data Engineer Associate cert.	Auto Loader, COPY INTO, Medallion, SCD2, Jobs, Unity Catalog
PySpark Developer Cert Prep	E-commerce analytics pipeline covering every domain of the Spark Developer Associate cert.	DataFrame API, Structured Streaming, Data Skew, Performance Tuning

Deep-Dives (`deep-dives/`)

Single-topic labs that measure the impact of a technique with numbers, not intuition.

Lab	What You Build	Focus
6 Delta Optimization Techniques	Iteratively apply and measure core Delta performance levers on a synthetic 50M-row dataset.	Partitioning, Z-Order, OPTIMIZE, Auto Optimize, Liquid Clustering, VACUUM

How to Use

Sign up for Databricks Free Edition (free, no credit card)
Clone or import this repo into Databricks (Workspace -> Create -> Git folder)
Navigate to the folder you want, open its README, follow the instructions

Everything runs on Free Edition: serverless compute, Unity Catalog, Delta Lake. No cloud account, no cluster config.

Which Should I Start With?

New to Databricks? Start with DE Associate Cert Prep - broadest fundamentals.
Want quick reps on a specific concept? Delta Lake exercises or ELT exercises - drill one concept at a time.
Comfortable with batch, new to streaming? Apparel DLT, then Fintech Monitoring.
Preparing for a cert? DE Associate or Spark Developer Associate.
Already shipping pipelines, want to go deeper on performance? Delta Optimization Techniques.

Stay in the Loop

New exercises and labs ship regularly. Follow on LinkedIn or subscribe to the Substack newsletter to be notified when new content drops.

Feedback

Found a bug? Have a suggestion? Open an issue.

Disclaimer: This is an independent educational resource created by Jakub Lasak. Not affiliated with, endorsed by, or sponsored by Databricks, Inc. "Databricks" and "Delta Lake" are trademarks of their respective owners.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Databricks Code Practice

Get fluent in Databricks by typing, not watching.

Author

What's Inside

Catalog

Exercises (`exercises/`)

Pipeline Labs (`pipeline-labs/`)

Deep-Dives (`deep-dives/`)

How to Use

Which Should I Start With?

Stay in the Loop

Feedback

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Databricks Code Practice

Get fluent in Databricks by typing, not watching.

Author

What's Inside

Catalog

Exercises (exercises/)

Pipeline Labs (pipeline-labs/)

Deep-Dives (deep-dives/)

How to Use

Which Should I Start With?

Stay in the Loop

Feedback

Exercises (`exercises/`)

Pipeline Labs (`pipeline-labs/`)

Deep-Dives (`deep-dives/`)