
The Data Engineering Show
The Data Engineering Show is a podcast for data engineering and BI practitioners that goes beyond theory. It features conversations with influential tech figures about their real-world data challenges and solutions in a casual setting. The show is hosted by the Firebolt Data Bros, including Eldad and Boaz Farkash, who founded Sisense and later Firebolt. Season 2 introduces Benjamin Wagner as a co-host. The podcast aims to provide practical insights for data professionals.
Episodes
AI for Data and Data for AI: The Dual Frontier of Modern Data Engineering with Pranav Motarwar
What if the data engineering skills you have today become obsolete in five years? In this episode, host Benjamin Wagner sits down with Pranav Motarwar, a data engineer who's witnessed the industry's transformation from traditional ETL to AI-powered pipelines, to explore how AI is fundamentally reshaping data engineering roles, why you need to master both "AI for data" and "data for AI" to stay rel
AI Won't Replace Engineers, But This Framework Will Change How They Build with Rohit Girme
What if you could build AI features with confidence while moving at the pace of innovation? In this episode, Benjamin Wagner sits down with Rohit Girma, Staff Software Engineer at Airbnb, to explore how to evaluate generative AI in production, why breaking down complex problems into smaller chunks accelerates development, and the key strategies for scaling AI-powered products beyond zero-to-one. W
The Framework Canva Uses for 200M+ Designers with Paul Tune
In this episode of The Data Engineering Show, Benjamin sits down with Paul Tune, Staff Research Scientist at Canva, to explore the advancement of machine learning at one of the world's leading design platforms. Learn how Canva is transitioning from traditional ML like recommendation engines for templates to cutting-edge agentic workflows that allow users and AI to collaborate on complex design tas
Llama 2 & 3 Safety: Soumya Batra on Agentic AI Training
What if the expertise that built foundation models could reshape how you think about AI's future? In this episode, Benjamin sits down with Soumya Batra, founder and CEO of WisePort AI and former safety lead on Llama 2 and Llama 3 at Meta, to explore how foundation models evolved from traditional NLP, why post-training holds the highest leverage for safety and controllability, and what natively age
The Data Fusion Secret & Why Custom Query Engines Fail with Nikita Lapkov
What if building a distributed SQL engine meant rethinking everything about how query execution works at scale? In this episode, Benjamin sits down with Nikita, Senior Software Engineer at Cloudflare, to explore how R2 SQL leverages object storage and distributed computing to power analytics across 300 global locations, why backward compatibility becomes critical when you can't control infrastruct
How Zipline AI Turns Weeks of Engineering Into Minutes of SQL Queries ft. Nikhil Simha
What if you could deploy ML features and real-time data pipelines without building complex infrastructure from scratch?
In this episode, host Benjamin sits down with Nikhil Simha, CTO at Zipline AI and co-author of Chronon AI, to explore how Chronon, an open-source system that generates data infrastructure from simple queries, is transforming feature engineering at companies like OpenAI and Airb
The Geo-Data Problem Nobody Talks About And How Voi Solved It ft. Magnus Dahlbäck
What if your data platform could power both critical business decisions and real-time product features at scale? In this episode, host Benjamin sits down with Magnus Dahlbäck, Senior Director of Data and Platform at Voi, to explore how a metrics-first approach and semantic layers transform data accessibility, why traditional ML and LLMs require different strategies for different problems, and how
Why 99% of Data Teams Give Up on Real-Time And How Artie Changes That
What happens when a team of seven engineers spends a year trying to build a production-ready CDC connector and fails? For Artie CTO and co-founder Robin Tang, it was the spark needed to build a platform that makes data streaming accessible. In this episode, Robin joins Benjamin to discuss the "DFS" (Deep First Search) approach to data sources, the engineering hurdles of real-time Postgres-to-Snowf
The $100M Problem: How Lyft's Data Platform Prevents ML Failures with Ritesh Varyani at Lyft
What if your data platform could serve AI-native workloads while scaling reliably across your entire organization? In this episode, Benjamin sits down with Ritesh, Staff Engineer at Lyft, to explore how to build a unified data stack with Spark, Trino, and ClickHouse, why AI is reshaping infrastructure decisions, and the strategies powering one of the industry's most sophisticated data platforms. W
60 Billion Predictions Daily: Inside Credit Karma’s Agentic Data Layer with Maddie Daianu
What does MLOps look like when you are deploying 60 billion machine learning predictions a day? Maddie Daianu, Head of Data and AI at Intuit Credit Karma, joins the Data Bros to pull back the curtain on one of the most high-volume data environments in FinTech. With a 100-person team serving 140 million members, standard data practices break down. Maddie shares how her team manages terabytes of da
Block Bad Data Before the Write with Nike’s Ashok Singamaneni
Nike’s Principal Data Engineer Ashok Singamaneni joins Benjamin and Eldad to discuss his open-source data quality framework, Spark Expectations. Ashok explains how the tool, which was inspired by Databricks DLT Expectations, shifts data quality checks to before the data is written to a final table. This proactive approach uses row-level, aggregation-level, and query data quality checks to fail job
Postgres vs. Elasticsearch: The Unexpected Winner in High-Stakes Search for Instacart with Ankit Mittal
Modernizing Search Infrastructure: How Instacart Transitioned from Elasticsearch to PostgreSQL for Enhanced Performance and Simplicity. In this episode of The Data Engineering Show, host Benjamin Wagner speaks with Ankit Mittal, former senior engineer at Instacart, about the company's innovative approach to modernizing their search infrastructure by transitioning from Elasticsearch to PostgreSQL f
Is Self-Service BI a False Promise? Lei Tang of Fabi.ai Thinks So
AI is reshaping business intelligence by enabling true self-service analytics and transforming how organizations interact with their data through natural language processing. In this episode of The Data Engineering Show, host Benjamin interviews Lei, Co-founder and CTO of Fabi.ai, to explore how AI-native BI platforms are reshaping data analytics and empowering non-technical users to derive meanin
Building Uber's AI Assistant: How Genie Revolutionizes On-Call Support with Paarth Chothani from Uber
In this episode of The Data Engineering Show, the bros speak with Paarth, a Staff Engineer at Uber, about his work on Genie - an innovative AI assistant that revolutionizes on-call support by combining RAG (Retrieval Augmented Generation) with agent-based automation to help engineers find solutions faster.
From Zero to 100M Users: Inside Notion’s Data Stack and AI Strategy with Sumit Gupta
Dive into the future of data engineering with Sumit Gupta, Lead BI Engineer at Notion, as he shares insights with the bros on navigating the AI revolution in modern data stacks. From leveraging tools like Snowflake and dbt to automating content creation with AI, discover how traditional technical skills are evolving alongside the rise of AI. Whether you're a seasoned data professional or just star
How Rising Wave Is Redefining Real-Time Data with Postgres Power
In this episode of The Data Engineering Show, the bros sit with Yingjun Wu, founder and CEO of Rising Wave, to explore the innovative world of stream processing systems. Yingjun shares his journey from academic research to creating a Postgres-compatible streaming system that drastically reduces resource usage. They discuss how Rising Wave's S3-based architecture and Postgres compatibility provide
Revolutionizing Data Governance with DataStrato’s Unified Open Source Approach
In this episode of The Data Engineering Show, the bros sit with Lisa Cao, Product Manager at DataStrato, to explore data catalogs and Apache Gravitino, a unified metadata lake used to manage access and perform data governance for all data sources. They discuss data catalogs and how they refine the data management process.
Database Technology in the Age of AI with DuckDB Labs co-creator Hannes Mühleisen
In this episode of The Data Engineering Show, the bros welcome the CEO DuckDB Labs and co-creator DuckDB, Hannes Mühleisen. They delve into the groundbreaking journey of DuckDB, an analytical database that processes billions of queries every month. Learn why DuckDB prioritizes broad compatibility over specialized optimizations, how its extension model works and the emerging solutions for database
AI and Data Movement: Trends and Best Practices with Estuary’s Daniel Pálma
In this episode of The Data Engineering Show, the bros sit with Daniel Pálma, Head of Marketing at Estuary, to delve into the intriguing world of data engineering and marketing. Daniel shares his transition journey into marketing from data engineering and how his technical proficiency has been leveraged to market to engineers. The conversation cuts across the importance of AI in data movement, the
AI and Data Change Management with Chad Sanderson, CEO Gable AI
In this episode of The Data Engineering Show, host Benjamin and co-host Eldad are joined by Chad Sanderson, CEO and co-founder of Gable AI to discuss the revolution of data quality and governance, the importance of understanding data flow and the processes that help organizations manage their data more effectively.
Tech Stacks and Tradeoffs: Xudo's Founder on Picking the Right Tools for BI Success
Wouter Trappers is the founder of Xudo and shares his slightly unconventional path from philosopher to data consultant with the Bros in this latest episode of The Data Engineering Show. Wouter’s grounding in philosophy has proved to be a shaping influence on his approach to business intelligence. Much more than just a software solution, for Wouter, BI is all about change management and aligning le
Data Rewind: Conversation Highlights from Zach Wilson, Matthew Housley, Joe Reis, and Krishnan Viswanathan
This is a special episode of The Data Engineering Show, and joining the Bros is not one guest, nor even two – instead they’re revisiting the best bits from three different fascinating episodes. In each, they spotlight essential trends and lessons learned across the evolving data engineering landscape. From data observability to bridging academia with real-world practice, this episode covers perspe
The Resurgence of SQL: Insights from Ryanne Dolan from LinkedIn
In this episode of The Data Engineering Show, Ryanne Dolan from LinkedIn joins the Bros to discuss LinkedIn's Hoptimator project. Ryanne explains how they’re simplifying complex data workflows by automating them through SQL queries, integrating Kubernetes, Kafka, and Flink. The conversation highlights the shift towards a consumer-driven data model and the future of data engineering.
Vector Databases Won’t Replace SQL - Andy Pavlo
SQL’s slow. SQL’s stupid. We hear these claims every time a new shiny tool enters the market, only to realize five years later when the hype dies down that SQL is actually a good idea. In this super techie episode of the Data Engineering Show, Andy Pavlo, Associate Professor at Carnegie Mellon University, joins the bros to delve into database internals and optimization. Andy discusses leveraging M
How ZoomInfo transitioned from data graveyards to ROI-driven data projects
Too often expensive resources and manhours are spent on dashboards no one uses, resulting in zero ROI. Philip Philip Zelitchenko, VP of Data & Analytics at ZoomInfo met the bros to talk about adopting product management principles to ensure data projects have value, and provide an unfiltered peak into ZoomInfo’s data stack and unique tech culture.
Matthew Weingarten from Disney Streaming about Data Quality Best Practices
Matthew Weingarten, Lead Data Engineer at Disney Streaming, talks about principles essential for data quality, cost optimization, debugging, and data modeling, as adopted by the world's leading companies.
Joseph Machado, Senior Data Engineer @ LinkedIn talks best practices
Data engineering should be less about the stack and more about best practices. While tools may change, foundational principles will remain constant. Joseph Mercado, Senior Data Engineer at LinkedIn, is on The Data Engineering Show to talk about principles that are key to success, leveraging AI for automation, and adopting software engineering methods.
Professors Joe Hellerstein and Joseph Gonzalez on LLMs
Joe Hellerstein is the Jim Gray Professor of Computer Science at Berkeley and Joseph Gonzalez is an Associate Professor in the Electrical Engineering and Computer Science department. They’ve inspired generations of database enthusiasts (including Benji and Eldad) and have come on the show to talk about all things LLM and RunLLM which they co-founded.If you consider yourself a hardcore engineer, th
Megan Lieu on powerful notebooks that enable collaboration
There are two types of data influencers on LinkedIn:1. Those who talk directly about the products and companies they work for2. Those that provide more general guidance, tips and opinions Can influencers actually be passionate about the products they’re developing and straightforwardly talk about them without sounding salesly? We’re kicking off 2024 with the amazing Megan Lieu on a new Data Engine
Transitioning from software engineering to data engineering
Every data team should have at least one data engineer with a software engineering background. This time on The Data Engineering Show, Xiaoxu Gao is an inspiring Python and data engineering expert with 10.6K followers on Medium. She’s a data engineer at Adyen with a software engineering background, and she met the bros to talk about why both software and data engineering skills are so important.Wi
Vin Vashishta explains why we should stop using dashboards
Vin Vashista, the guy we all love to follow, has never seen a dashboard with positive ROI. This time on The Data Engineering Show, he met the bros to talk about the difference between BI dashboards and analytics that actually introduce knowledge. It’s no longer just about the data volume, it’s about quality and relevance.
Joe Reis and Matt Housley on the fundamentals of data engineering
After co-writing the best-selling book ‘Fundamentals of Data Engineering’, Joe Reis and Matt Housely joined the bros for some much-needed ranting, priceless data advice, and good laughs. So why are we still talking about providing business value and dashboards, even though we don’t really have anything new to say? If there are so many great tools in the data stack, why are we still so troubled? Ho
Bill Inmon, the Godfather of Data Warehousing
As people in the data industry go, Bill Inmon is among the top, often seen as the godfather of the data warehouse. In this Data Engineering Show episode, Bill Inmon talks about surviving rabbit holes throughout the evolution of data, the data modeling renaissance, and why ChatGPT is not Textual ETL.
Large-scale data engineering at Momentive.ai - Meenal Iyer
As companies scale, data gets messy. The data team says one thing, the business team says something completely different. Meenal Iyer, VP Data at Momentive.ai, Met the Data Bros to talk about enforcing collaboration in large organizations to ensure what she considers the three most important data factors: Adoption, Trust, and Value.
Data engineering from the early 2000s till today - BlackRock
When it comes to data management, have we come a long way since the early 2000s? Or has it simply taken us 20 years to finally realize that you can’t scale properly without data modeling. With over 20 years of experience in the data space, leading engineering teams at Cisco, Oracle, Greenplum, and now as Sr. Director of Engineering at BlackRock, Krishnan Viswanathan talks about the data engineerin
Zach Wilson on what makes a great data engineer
How good you are at Spark or Flink ≠ how good you are at data engineering. After years of data engineering experience at Airbnb, Netflix, and Facebook, Zach Wilson is now focused on spreading the knowledge in EcZachly and all over social media. He met Benjamin Wagner to explain why data modeling and storytelling are more important than the actual tech, why data engineering is going to see more job
How ZipRecruiter and Yotpo power self-service data platforms that work
Data engineers are not paid to do support. Liran Yogev, Director of Engineering at ZipRecruiter, and Doron Porat, Director of Infrastructure at Yotpo talk about building resilient self-service products that keep customers happy and engineers calm. They walked the bros through their data stacks and explained how ZipRecruiter is completely rebuilding its data layer from scratch.
Data Observability with Millions of Users - Barr Moses
Barr Moses, CEO of Monte Carlo explains the difference between data quality and data observability, and how to make sure your data is accurate in a world where so many different teams are accessing it.
How Amplitude Engineers Process 5 Trillion Real-time Events
Weichen Wang, Senior Engineering Manager at Amplitude, came to meet the bros to talk about Amplitude's cutting-edge data stack and how it processes 5 Trillion real-time events while dealing with mutable data and massive scale.
Making Observability a Key Business Driver
80% of the code that you write doesn’t work on the first try. And that’s fine. But knowing which 80% is not working and which 20% is working is the actual challenge. After 10 years at Facebook, managing and scaling the Seattle site to over 6000 engineers(!) Vijaye Raji founded Statsig to make observability automated and real-time. How is the semantic layer managed? How was the Statsig team able to
A ClickHouse Review from a Practitioner’s Point of View
Sudeep Kumar, Prinipal Engineer at Salesforce is a ClickHouse fan. He considers the shift to ClickHouse as one of his biggest accomplishments during his eBay days and walks Boaz through his experience with the platform. How on one hand it handled 2B events per minute, but also how it required rollups which compromised granularity when extending time windows.
Besides a ClickHouse review from a pr
The Creator of Airflow About His Recipe for Smart Data-Driven Companies
According to Maxime Beauchemin, CEO & Founder at Preset and Creator of Apache Superset and Apache Airflow, building a thriving company is not so straight-forward. So how did he do it?
Choosing the right system and services is key for a successful start, and can help you avoid the chaos of having too many tools spread across multiple teams.
Max walks the Bros through his recipe for a smart
How Similarweb Delivers Customer Facing Analytics Over 100s of TBs
According to Yoav Shmaria, VP R&D Platform at Similarweb, the best way to manage data warehouse costs is tagging every table, database or ETL running to have good granularity over every feature.
Besides handy cost management tips, Yoav walks the bros through the tech stack he implemented to analyze 100s of TBs of web data to serve fast customer-facing analytics.
Full disclosure, Similarweb
How Klarna Designed a New Data Platform in the Cloud
Klarna is one of the leading fintech companies in the world, valued at $45B.
While many corporations are “stuck” on-prem, Klarna made the move and today is a cloud-only company. Gunnar Tangring, Klarna’s Lead Data Engineer tells Boaz what this new modernized stack looks like.
How Eventbrite is Modernizing its Data Stack
Archana Ganapathi, Head of Data & Analytics Engineering at Eventbrite, shares Eventbrite’s data stack modernization process, and how you get engineers to adopt new technologies like dbt which may be outside their comfort zone.
A Deep Dive into Slack's Data Architecture
Growing from a startup to an IPOed and then an acquired company meant that Slack’s sales org was scaling rapidly.
Apun Hiran, Slack’s Director of Software Engineering explains how the data stack and architecture evolved to support this growth with more reliable and timely metrics.
Speaker: Apun Hiran, Director of Software Engineering (Data), Slack
Hosts: Eldad and Boaz Farkash, CEO and CPO, Fi
Transitioning Scopely’s 5.5 PB Data Platform to the Modern Data Stack
Should data engineering AND BI be handled by the same people? According to Jonathan Palmer, VP Data Platform at Scopely – YES. By Analytics Engineers.
His team of Analytics Engineers is in the final stages of transitioning 5.5 PBs of data which include 15B evens per day to the modern data stack. Tune in to learn how they did it.
Getting rid of raw data with Jens Larsson
Why would you create ugly data? According to Jens Larsson, don’t even go near raw data. Jens started off at Google, continued to manage data science at Spotify, caught the startup bug at Tink, and recently joined an exciting new company called Ark Kapital, together with Spotify’s former VP Analytics. Jens explains how he and his team killed the notion of raw data at Tink and walks us through the G
How Zendesk engineers manage customer-facing data applications
This time on the data engineering show, Eldad abandoned his brother Boaz but it’s ok because Boaz got the full 30 minutes to talk to one of the most interesting people in the data space.
Ananth Packkildurai is Principal Software Engineer at Zendesk and runs one of the strongest newsletters in data – Data Engineering Weekly.
He talked about data applications at Zendesk and how they’re built, te
How are those data intensive customer facing apps engineered at Gong?
Gong manages hundreds of thousands of videoconferences and millions of emails PER DAY, which add up to hundreds of TBs.
The Data Bros met Yarin Benado, Gong’s engineering manager to understand what is required to move to a modern data stack to support all this, what this stack looks like, and why it all comes down to data quality at the end of the day.
How Bolt Engineers Are Designing Its Next-Gen Data Platform
Bolt's ride-hailing app serves over 75M users in Europe and Africa and handles 500K queries every day.
Erik Heintare along with Bolt's engineering team is in the midst of designing a new next-gen data platform and is sharing how it's going to solve their biggest data challenges.
Guest: Erik Heintare - Senior Analytics Engineer at Bolt
Hosts: Eldad and Boaz Farkash, AKA The Data Bros
How did Agoda scale its data platform to support 1.5T events per day?
Scaling a data platform to support 1.5T events per day requires complicated technical migrations and alignment between hundreds of engineers. What to see how Agoda did it.
Guests:
Amir Arad, Director of Machine Learning, Agoda
Shaun Sit, Senior Dev Manager, Agoda
Hosts:
The Data Bros - Eldad and Boaz Farkash
Diving Into GitHub's Data Stack
It’s the mother of all development projects. You use it daily. And so do 65M developers around the world. This time on the Data Engineering Show – A deep dive into GitHub’s data stack. Arfon Smith KimYen (Truong) Ladia shared GitHub’s data engineering challenges and solutions and explained why every developer should know and adopt the ADR protocol.
Building Data Products For Data Engineers
How does a tech stack that always needs to be at the forefront of technology look like?
Roy Miara from Explorium talks about building data products for the audience that can’t be fooled – Data Engineers.
How Vimeo Keeps Data Intact with 85B Events Per Month
How does the Viemo data team deal with 2 PBs of data and 85B events per month? What made them recently build a data ops team? What data tool does the team love? And why (the hell) did they call their legacy platform Fatal Attraction? Guest: Lior Solomon, VP Data Engineering at Vimeo.
How Substack's Data Stack Supports 500K Paying Subscribers
Substack is an amazing — if not the most amazing — content publishing platform out there. Essentially, it allows anyone to become a journalist or to start their own newsletters and charge subscriptions for them. So how did they build a data stack that can support all of their 500K paying subscribers?
Guest: Mike Cohen, Data Engineer at SubStack
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO an
A Technical Deep Dive to Yelp's Data Infrastructure - With Steven Moy
As an expert in query engines and performance-related challenges, Steven Moy explains how Yelp handled its huge data growth in the past ten years.
Guest: Steven Moy, Software Engineer at Yelp
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt
How Canva's Data Engineers and Analysts Support 55M Active Users
Canva is one of the hottest, if not the hottest, graphic design platforms out there. Only a week ago it was announced that they reached a staggering 16 Billion dollar valuation, after having seen even stronger growth during the pandemic. With 55 million active users and around 500 million dollars in annual revenue, it seems that Canva is unstoppable.
So how do Canva analysts and engineers scale
How AppsFlyer Delivers Sub-Second BI to 1000 Looker Users - With Alexandra Sudilovsky
AppsFlyer has exploded in size, growing from a small company of 200 people to 1000 people in just three years. Dealing not only with a huge amount of data on a daily basis but doing so while growing quickly as a company can come with many challenges.
Guest: Alexandra Sudilovsky, Senior BI Expert at AppsFlyer
Hosts: The Data Bros, Eldad and Boaz Farkash, CEO and CPO at Firebolt
The Data Engineering Show - Coming Soon...
The Data Engineering Show is a podcast for data engineering and BI practitioners to go beyond theory, and learn from the biggest influencers in tech about their practical day to day data challenges and solutions in a casual and fun setting.











