
The Databricks Data Engineer
Helping 18k+ Databricks data engineers become seniors: interview like seniors, execute like seniors, think like seniors.
Episodes
How Photon actually makes your Databricks queries faster (and when it silently doesn't)
Two engineers run the same SQL on the same Delta table. Same data, same cluster size, copy-pasted code. Alex goes to make a coffee and comes back to a query still running. Sam's is done before they finish reading the first Slack message. The only difference is one checkbox on the cluster called Photon.Most Databricks data engineers have that box ticked, pay a premium for it on every DBU, and c
The Databricks interview round nobody studies for (and almost everybody fails)
Picture the debrief room after a Databricks loop. Two candidates went through that day. On paper, a coin flip: SQL tied, Spark internals solid, system design clean for both. Score only the rounds with a rubric and you cannot separate them. And yet the room isn't split. One gets the offer, and the thing that decided it wasn't any of the rounds they studied for.It was the conversation everyo
The Spark Shuffle is baggage claim: why your job waits instead of computes (and more workers won't fix it)
Your Spark job has been running for forty minutes. The dashboard shows your cluster isn't even busy. So you do the obvious thing: add more workers. And it changes nothing.Here's why. During a shuffle, Spark is barely computing at all. It's tagging every row by destination, piling rows together, spilling the overflow to disk, and hauling data across the network between executors. It'
Your Databricks data quality framework is a Yeti: everyone talks about it, nobody has seen it work
An architecture review. A platform team is presenting their data quality setup, and honestly, it's impressive. Expectations on every ingestion table. Drift metrics on the dashboard. A dedicated alerts channel. Then a finance engineer asks the only question that counts: when did this last catch something before one of us did? Silence.That silence is the whole problem. The decks, the suites, the
Why senior Databricks engineers write less code than mid-level ones
Two engineers, same team, both five years in. Last quarter Mark shipped forty-seven pull requests across three pipelines. Sam shipped nine. On any dashboard, Mark wins by a mile. Sam got the staff offer. Mark got a kind note about continuing to demonstrate impact.This isn't politics, and it isn't luck. It's a pattern that specifically catches the engineers who are best at shipping, bec
4 habits that quietly turn your Databricks Delta Lake into a swamp
You built the table right. Well-partitioned, documented, fast enough that the row count came back before you finished reading your own Slack. Six months later it takes four minutes to return that same count, and nobody on your team ever decided to make it that way. There was no meeting, no design doc, no ticket titled "let's make this unqueryable by Q3."A swamp is not a decision. It&
Liquid Clustering vs Z-Ordering: 4 questions that decide
You open your Databricks workspace. Two Delta tables. Same size, same downstream BI workload. Table A was partitioned and z-ordered in 2023, runs fine. Table B is greenfield this quarter, liquid clustering by default. Your tech lead asks how aggressive you want to be with migration tickets. Whatever you type back is probably wrong.This is not a feature swap. It's a paradigm shift, and the migratio
The compounding curve: why some Databricks engineers' salaries grow 5x faster than others
Year one. Two new juniors join the same Databricks platform org. Same starting salary, same skills, same desk. Year three, five thousand bucks apart. Year eight, household-car-and-a-half apart. Every year. Forever.Both worked hard. Both stayed technical. Both got positive reviews. Neither did anything wrong. So what happened? Salary in this field isn't one curve. It's two that look identic
The 90/9/1 rule of Databricks performance work - how to triage Spark optimization in 60 seconds
Your team is three weeks into a Databricks performance push. Broadcast hints in PRs. AQE flags toggled like christmas lights. Partition counts re-tuned for the third time. The manager is asking, gently, when the gains are showing up in the bill.The staff DE on the next team finished theirs in two afternoons. Same workloads, bigger drop. They were running a triage you have never been taught.In this
The Databricks data engineer in 2026 - the four shifts that just changed your job
You scroll past the cancelled junior req, the "serverless first" line on your director's planning slide, and the third Lakebase mention from your Databricks rep this quarter. Each one looks like a news item. None of them feel like they're about you.They are. Four structural shifts have already happened in the field, and the words "Databricks data engineer" don't mea
9 Behaviors Quietly Killing Your Promotion To Senior Databricks Data Engineer
Mid-level is a down escalator. It looks like flat ground. You feel productive, your tickets close on Friday, your burndown chart is healthy, and your review says "reliable executor of well-defined work" for the third cycle in a row.That sentence is the official label for "not getting promoted this year" - and most Databricks data engineers never decode it. It isn't a skill
The Dashboard Theater: What Databricks Engineers Build That Nobody Opens
You check the usage logs on a dashboard you spent two weeks building. Zero views. Not low views. Zero. The stakeholder who requested it hasn't logged in once. Three months later they ask the exact question the dashboard answers, in a meeting, out loud, as if the dashboard doesn't exist. Because for them, it doesn't.In this episode:- Why the most technically impressive Databricks dashbo
The Preparation Gap: What Interviewers Actually Evaluate in Databricks Data Engineers
She could explain shuffle hash join versus sort merge join. She knew when Adaptive Query Execution kicks in. She had six weeks of notes on Delta Lake, Spark memory, and cluster configs. She walked into the Databricks senior interview feeling genuinely confident. Then the interviewer asked her to walk through a diagnosis, not recite a definition, and everything she studied was aimed at the wrong ta
The Invisible Engineer: Why Your Best Work Gets the Least Recognition
She kept a terabyte-scale pipeline running for six months without a single incident. Not one page, not one late dashboard. Then review season came and the engineer who spent two weekends fixing an outage he partly caused got the promotion instead. Her name wasn't in a single incident report, because when you prevent problems, there's no report to put your name on.In this episode:- Why the











