The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

How AI Learns to Smell with Alex Wiltschko - #771 Jul 8, 2026 3595 In this episode, Alex Wiltschko, founder and CEO of Osmo, joins the show to discuss his goal of giving computers a sense of smell and what it takes to build olfactory intelligence. We explore the science behind smell, from the hundreds of olfactory receptors in the human nose to the challenge of mapping the relationship between molecular structure and odor, ensuring safety regulations are met, an

Why AI Agents Break the GenAI Security Model with Devvret Rishi - #770 Jun 16, 2026 3378 In this episode, Sam talks with Dev Rishi, GM of AI at Rubrik, about what happens when agents move beyond answering questions and start taking action across tools, systems, and business processes. We explore why the enterprise playbook of static guardrails plus human approval starts to break down in the agent era. Agents are useful because they can plan, call tools, update systems, write code, se

Is RAG Dead? Lessons from Building AI for Tax Law with Alex Bowcut - #769 Jun 9, 2026 3092 As context windows grow into the millions of tokens, many AI practitioners are questioning whether retrieval-augmented generation (RAG) is still necessary. If modern models can ingest entire libraries of documents, why bother with retrieval at all? In this episode, Alex Bowcut, Head of Engineering at Sphere, explains why the answer depends on the application. Sphere uses AI to automate global tax

Relational Foundation Models for Enterprise Data with Jure Leskovec - #768 May 21, 2026 3983 In this episode, Jure Leskovec, co-founder and chief scientist at Kumo and professor of computer science at Stanford, joins us to explore two fronts of his work: AI for science and relational deep learning. We begin with AI Virtual Cell, a multiscale effort to learn data-driven representations from proteins to cells to patients using single-cell RNA-seq data, protein language models like ESM, and

How to Find the Agent Failures Your Evals Miss with Scott Clark - #767 May 7, 2026 3199 In this episode, Scott Clark, co-founder and CEO of Distributional, joins us to explore how teams can reliably operate and improve complex LLM systems and agents in production. Scott introduces a Maslow’s hierarchy of observability: telemetry for logging, monitoring for known signals, and post-production or online analytics to surface unknown unknowns. We dig into examples of real-world failures S

How to Engineer AI Inference Systems with Philip Kiely - #766 Apr 30, 2026 3291 In this episode, Philip Kiely, head of AI education at Baseten, joins us to unpack the fast-evolving discipline of inference engineering. We explore why inference has become the stickiest and most critical workload in AI, how it blends GPU programming, applied research, and large-scale distributed systems, and where the line sits between inference and model serving. Philip shares how research-to-p

How Capital One Delivers Multi-Agent Systems with Rashmi Shetty - #765 Apr 16, 2026 3258 In this episode, Rashmi Shetty, senior director of enterprise generative AI platform at Capital One, joins us to explore how the company is designing, deploying, and scaling multi-agent systems in a highly regulated environment. Rashmi walks us through Chat Concierge, a multi-agent chat experience for auto dealerships that handles intent disambiguation, tool invocation, and human handoffs to deliv

The Race to Production-Grade Diffusion LLMs with Stefano Ermon - #764 Mar 26, 2026 3798 Today, we're joined by Stefano Ermon, associate professor at Stanford University and CEO of Inception Labs to discuss diffusion language models. We dig into how diffusion approaches—traditionally used for images—are being adapted for text and code generation, the technical challenges of applying continuous methods to discrete token spaces, and how diffusion models compare to traditional autoregres

Agent Swarms and Knowledge Graphs for Autonomous Software Development with Siddhant Pardeshi - #763 Mar 10, 2026 4574 In this episode, Sid Pardeshi, co-founder and CTO of Blitzy, joins us to discuss building autonomous development systems able to deliver production-ready software at enterprise scale. Sid contrasts AI-assisted coding with end-to-end autonomy, arguing that “code is a commodity” and acceptance is the real metric—security, standards, tests, and maintainability included. We explore Blitzy’s hybrid gra

AI Trends 2026: OpenClaw Agents, Reasoning LLMs, and More with Sebastian Raschka - #762 Feb 26, 2026 4735 In this episode, Sebastian Raschka, independent LLM researcher and author, joins us to break down how the LLM landscape has changed over the past year and what is likely to matter most in 2026. We discuss the shift from raw model scaling to reasoning-focused post-training, inference-time techniques, and better tool integration. Sebastian explains why methods like self-consistency, self-refinement,

The Evolution of Reasoning in Small Language Models with Yejin Choi - #761 Jan 29, 2026 3981 Today, we're joined by Yejin Choi, professor and senior fellow at Stanford University in the Computer Science Department and the Institute for Human-Centered AI (HAI). In this conversation, we explore Yejin’s recent work on making small language models reason more effectively. We discuss how high-quality, diverse data plays a central role in closing the intelligence gap between small and large mod

Intelligent Robots in 2026: Are We There Yet? with Nikita Rudin - #760 Jan 8, 2026 3997 Today, we're joined by Nikita Rudin, co-founder and CEO of Flexion Robotics to discuss the gap between current robotic capabilities and what’s required to deploy fully autonomous robots in the real world. Nikita explains how reinforcement learning and simulation have driven rapid progress in robot locomotion—and why locomotion is still far from “solved.” We dig into the sim2real gap, and how addin

Rethinking Pre-Training for Agentic AI with Aakanksha Chowdhery - #759 Dec 17, 2025 3174 Today, we're joined by Aakanksha Chowdhery, member of technical staff at Reflection, to explore the fundamental shifts required to build true agentic AI. While the industry has largely focused on post-training techniques to improve reasoning, Aakanksha draws on her experience leading pre-training efforts for Google’s PaLM and early Gemini models to argue that pre-training itself must be rethought

Why Vision Language Models Ignore What They See with Munawar Hayat - #758 Dec 9, 2025 3460 In this episode, we’re joined by Munawar Hayat, researcher at Qualcomm AI Research, to discuss a series of papers presented at NeurIPS 2025 focusing on multimodal and generative AI. We dive into the persistent challenge of object hallucination in Vision-Language Models (VLMs), why models often discard visual information in favor of pre-trained language priors, and how his team used attention-guide

Scaling Agentic Inference Across Heterogeneous Compute with Zain Asgar - #757 Dec 2, 2025 2924 In this episode, Zain Asgar, co-founder and CEO of Gimlet Labs, joins us to discuss the heterogeneous AI inference across diverse hardware. Zain argues that the current industry standard of running all AI workloads on high-end GPUs is unsustainable for agents, which consume significantly more tokens than traditional LLM applications. We explore Gimlet’s approach to heterogeneous inference, which i

Proactive Agents for the Web with Devi Parikh - #756 Nov 19, 2025 3364 Today, we're joined by Devi Parikh, co-founder and co-CEO of Yutori, to discuss browser use models and a future where we interact with the web through proactive, autonomous agents. We explore the technical challenges of creating reliable web agents, the advantages of visually-grounded models that operate on screenshots rather than the browser’s more brittle document object model, or DOM, and why t

AI Orchestration for Smart Cities and the Enterprise with Robin Braun and Luke Norris - #755 Nov 12, 2025 3286 Today, we're joined by Robin Braun, VP of AI business development for hybrid cloud at HPE, and Luke Norris, co-founder and CEO of Kamiwaza, to discuss how AI systems can be used to automate complex workflows and unlock value from legacy enterprise data. Robin and Luke detail high-impact use cases from HPE and Kamiwaza’s collaboration on an “Agentic Smart City” project for Vail, Colorado, including

Building an AI Mathematician with Carina Hong - #754 Nov 4, 2025 3352 In this episode, Carina Hong, founder and CEO of Axiom, joins us to discuss her work building an "AI Mathematician." Carina explains why this is a pivotal moment for AI in mathematics, citing a convergence of three key areas: the advanced reasoning capabilities of modern LLMs, the rise of formal proof languages like Lean, and breakthroughs in code generation. We explore the core technical challeng

High-Efficiency Diffusion Models for On-Device Image Generation and Editing with Hung Bui - #753 Oct 28, 2025 3143 In this episode, Hung Bui, Technology Vice President at Qualcomm, joins us to explore the latest high-efficiency techniques for running generative AI, particularly diffusion models, on-device. We dive deep into the technical challenges of deploying these models, which are powerful but computationally expensive due to their iterative sampling process. Hung details his team's work on SwiftBrush and

Vibe Coding's Uncanny Valley with Alexandre Pesant - #752 Oct 22, 2025 4356 Today, we're joined by Alexandre Pesant, AI lead at Lovable, who joins us to discuss the evolution and practice of vibe coding. Alex shares his take on how AI is enabling a shift in software development from typing characters to expressing intent, creating a new layer of abstraction similar to how high-level code compiles to machine code. We explore the current capabilities and limitations of codi

Dataflow Computing for AI Inference with Kunle Olukotun - #751 Oct 14, 2025 3457 In this episode, we're joined by Kunle Olukotun, professor of electrical engineering and computer science at Stanford University and co-founder and chief technologist at Sambanova Systems, to discuss reconfigurable dataflow architectures for AI inference. Kunle explains the core idea of building computers that are dynamically configured to match the dataflow graph of an AI model, moving beyond the

Recurrence and Attention for Long-Context Transformers with Jacob Buckman - #750 Oct 7, 2025 3443 Today, we're joined by Jacob Buckman, co-founder and CEO of Manifest AI to discuss achieving long context in transformers. We discuss the bottlenecks of scaling context length and recent techniques to overcome them, including windowed attention, grouped query attention, and latent space attention. We explore the idea of weight-state balance and the weight-state FLOP ratio as a way of reasoning abo

The Decentralized Future of Private AI with Illia Polosukhin - #749 Sep 30, 2025 3903 In this episode, Illia Polosukhin, a co-author of the seminal "Attention Is All You Need" paper and co-founder of Near AI, joins us to discuss his vision for building private, decentralized, and user-owned AI. Illia shares his unique journey from developing the Transformer architecture at Google to building the NEAR Protocol blockchain to solve global payment challenges, and now applying those dec

Inside Nano Banana 🍌 and the Future of Vision-Language Models with Oliver Wang - #748 Sep 23, 2025 3819 Today, we’re joined by Oliver Wang, principal scientist at Google DeepMind and tech lead for Gemini 2.5 Flash Image—better known by its code name, “Nano Banana.” We dive into the development and capabilities of this newly released frontier vision-language model, beginning with the broader shift from specialized image generators to general-purpose multimodal agents that can use both visual and text

Is It Time to Rethink LLM Pre-Training? with Aditi Raghunathan - #747 Sep 16, 2025 3506 Today, we're joined by Aditi Raghunathan, assistant professor at Carnegie Mellon University, to discuss the limitations of LLMs and how we can build more adaptable and creative models. We dig into her ICML 2025 Outstanding Paper Award winner, “Roll the dice & look before you leap: Going beyond the creative limits of next-token prediction,” which examines why LLMs struggle with generating truly nov

Building an Immune System for AI Generated Software with Animesh Koratana - #746 Sep 9, 2025 3911 Today, we're joined by Animesh Koratana, founder and CEO of PlayerZero to discuss his team’s approach to making agentic and AI-assisted coding tools production-ready at scale. Animesh explains how rapid advances in AI-assisted coding have created an “asymmetry” where the speed of code output outpaces the maturity of processes for maintenance and support. We explore PlayerZero’s debugging and code

Autoformalization and Verifiable Superintelligence with Christian Szegedy - #745 Sep 2, 2025 4308 In this episode, Christian Szegedy, Chief Scientist at Morph Labs, joins us to discuss how the application of formal mathematics and reasoning enables the creation of more robust and safer AI systems. A pioneer behind concepts like the Inception architecture and adversarial examples, Christian now focuses on autoformalization—the AI-driven process of translating mathematical concepts from their hu

Multimodal AI Models on Apple Silicon with MLX with Prince Canuma - #744 Aug 26, 2025 4220 Today, we're joined by Prince Canuma, an ML engineer and open-source developer focused on optimizing AI inference on Apple Silicon devices. Prince shares his journey to becoming one of the most prolific contributors to Apple’s MLX ecosystem, having published over 1,000 models and libraries that make open, multimodal AI accessible and performant on Apple devices. We explore his workflow for adaptin

Genie 3: A New Frontier for World Models with Jack Parker-Holder and Shlomi Fruchter - #743 Aug 19, 2025 3661 Today, we're joined by Jack Parker-Holder and Shlomi Fruchter, researchers at Google DeepMind, to discuss the recent release of Genie 3, a model capable of generating “playable” virtual worlds. We dig into the evolution of the Genie project and review the current model’s scaled-up capabilities, including creating real-time, interactive, and high-resolution environments. Jack and Shlomi share their

Closing the Loop Between AI Training and Inference with Lin Qiao - #742 Aug 12, 2025 3671 In this episode, we're joined by Lin Qiao, CEO and co-founder of Fireworks AI. Drawing on key lessons from her time building PyTorch, Lin shares her perspective on the modern generative AI development lifecycle. She explains why aligning training and inference systems is essential for creating a seamless, fast-moving production pipeline, preventing the friction that often stalls deployment. We exp

Context Engineering for Productive AI Agents with Filip Kozera - #741 Jul 29, 2025 2761 In this episode, Filip Kozera, founder and CEO of Wordware, explains his approach to building agentic workflows where natural language serves as the new programming interface. Filip breaks down the architecture of these "background agents," explaining how they use a reflection loop and tool-calling to execute complex tasks. He discusses the current limitations of agent protocols like MCPs and how

Infrastructure Scaling and Compound AI Systems with Jared Quincy Davis - #740 Jul 22, 2025 4382 In this episode, Jared Quincy Davis, founder and CEO at Foundry, introduces the concept of "compound AI systems," which allows users to create powerful, efficient applications by composing multiple, often diverse, AI models and services. We discuss how these "networks of networks" can push the Pareto frontier, delivering results that are simultaneously faster, more accurate, and even cheaper than

Building Voice AI Agents That Don’t Suck with Kwindla Kramer - #739 Jul 15, 2025 4382 In this episode, Kwindla Kramer, co-founder and CEO of Daily and creator of the open source Pipecat framework, joins us to discuss the architecture and challenges of building real-time, production-ready conversational voice AI. Kwin breaks down the full stack for voice agents—from the models and APIs to the critical orchestration layer that manages the complexities of multi-turn conversations. We

Distilling Transformers and Diffusion Models for Robust Edge Use Cases with Fatih Porikli - #738 Jul 9, 2025 3629 Today, we're joined by Fatih Porikli, senior director of technology at Qualcomm AI Research for an in-depth look at several of Qualcomm's accepted papers and demos featured at this year’s CVPR conference. We start with “DiMA: Distilling Multi-modal Large Language Models for Autonomous Driving,” an end-to-end autonomous driving system that incorporates distilling large language models for structure

Building the Internet of Agents with Vijoy Pandey - #737 Jun 24, 2025 3373 Today, we're joined by Vijoy Pandey, SVP and general manager at Outshift by Cisco to discuss a foundational challenge for the enterprise: how do we make specialized agents from different vendors collaborate effectively? As companies like Salesforce, Workday, and Microsoft all develop their own agentic systems, integrating them creates a complex, probabilistic, and noisy environment, a stark contra

LLMs for Equities Feature Forecasting at Two Sigma with Ben Wellington - #736 Jun 17, 2025 3571 Today, we're joined by Ben Wellington, deputy head of feature forecasting at Two Sigma. We dig into the team’s end-to-end approach to leveraging AI in equities feature forecasting, covering how they identify and create features, collect and quantify historical data, and build predictive models to forecast market behavior and asset prices for trading and investment. We explore the firm's platform-c

Zero-Shot Auto-Labeling: The End of Annotation for Computer Vision with Jason Corso - #735 Jun 10, 2025 3405 Today, we're joined by Jason Corso, co-founder of Voxel51 and professor at the University of Michigan, to explore automated labeling in computer vision. Jason introduces FiftyOne, an open-source platform for visualizing datasets, analyzing models, and improving data quality. We focus on Voxel51’s recent research report, “Zero-shot auto-labeling rivals human performance,” which demonstrates how zer

Grokking, Generalization Collapse, and the Dynamics of Training Deep Neural Networks with Charles Martin - #734 Jun 5, 2025 5121 Today, we're joined by Charles Martin, founder of Calculation Consulting, to discuss Weight Watcher, an open-source tool for analyzing and improving Deep Neural Networks (DNNs) based on principles from theoretical physics. We explore the foundations of the Heavy-Tailed Self-Regularization (HTSR) theory that underpins it, which combines random matrix theory and renormalization group ideas to uncove

Google I/O 2025 Special Edition - #733 May 28, 2025 1581 Today, I’m excited to share a special crossover edition of the podcast recorded live from Google I/O 2025! In this episode, I join Shawn Wang aka Swyx from the Latent Space Podcast, to interview Logan Kilpatrick and Shrestha Basu Mallick, PMs at Google DeepMind working on AI Studio and the Gemini API, along with Kwindla Kramer, CEO of Daily and creator of the Pipecat open source project. We cover

RAG Risks: Why Retrieval-Augmented LLMs are Not Safer with Sebastian Gehrmann - #732 May 21, 2025 3429 Today, we're joined by Sebastian Gehrmann, head of responsible AI in the Office of the CTO at Bloomberg, to discuss AI safety in retrieval-augmented generation (RAG) systems and generative AI in high-stakes domains like financial services. We explore how RAG, contrary to some expectations, can inadvertently degrade model safety. We cover examples of unsafe outputs that can emerge from these system

From Prompts to Policies: How RL Builds Better AI Agents with Mahesh Sathiamoorthy - #731 May 13, 2025 3685 Today, we're joined by Mahesh Sathiamoorthy, co-founder and CEO of Bespoke Labs, to discuss how reinforcement learning (RL) is reshaping the way we build custom agents on top of foundation models. Mahesh highlights the crucial role of data curation, evaluation, and error analysis in model performance, and explains why RL offers a more robust alternative to prompting, and how it can improve multi-s

How OpenAI Builds AI Agents That Think and Act with Josh Tobin - #730 May 6, 2025 4047 Today, we're joined by Josh Tobin, member of technical staff at OpenAI, to discuss the company’s approach to building AI agents. We cover OpenAI's three agentic offerings—Deep Research for comprehensive web research, Operator for website navigation, and Codex CLI for local code execution. We explore OpenAI’s shift from simple LLM workflows to reasoning models specifically trained for multi-step ta

CTIBench: Evaluating LLMs in Cyber Threat Intelligence with Nidhi Rastogi - #729 Apr 30, 2025 3378 Today, we're joined by Nidhi Rastogi, assistant professor at Rochester Institute of Technology to discuss Cyber Threat Intelligence (CTI), focusing on her recent project CTIBench—a benchmark for evaluating LLMs on real-world CTI tasks. Nidhi explains the evolution of AI in cybersecurity, from rule-based systems to LLMs that accelerate analysis by providing critical context for threat detection and

Generative Benchmarking with Kelly Hong - #728 Apr 23, 2025 3257 In this episode, Kelly Hong, a researcher at Chroma, joins us to discuss "Generative Benchmarking," a novel approach to evaluating retrieval systems, like RAG applications, using synthetic data. Kelly explains how traditional benchmarks like MTEB fail to represent real-world query patterns and how embedding models that perform well on public benchmarks often underperform in production. The convers

Exploring the Biology of LLMs with Circuit Tracing with Emmanuel Ameisen - #727 Apr 14, 2025 5646 In this episode, Emmanuel Ameisen, a research engineer at Anthropic, returns to discuss two recent papers: "Circuit Tracing: Revealing Language Model Computational Graphs" and "On the Biology of a Large Language Model." Emmanuel explains how his team developed mechanistic interpretability methods to understand the internal workings of Claude by replacing dense neural network components with sparse

Teaching LLMs to Self-Reflect with Reinforcement Learning with Maohao Shen - #726 Apr 8, 2025 3105 Today, we're joined by Maohao Shen, PhD student at MIT to discuss his paper, “Satori: Reinforcement Learning with Chain-of-Action-Thought Enhances LLM Reasoning via Autoregressive Search.” We dig into how Satori leverages reinforcement learning to improve language model reasoning—enabling model self-reflection, self-correction, and exploration of alternative solutions. We explore the Chain-of-Acti

Waymo's Foundation Model for Autonomous Driving with Drago Anguelov - #725 Mar 31, 2025 4147 Today, we're joined by Drago Anguelov, head of AI foundations at Waymo, for a deep dive into the role of foundation models in autonomous driving. Drago shares how Waymo is leveraging large-scale machine learning, including vision-language models and generative AI techniques to improve perception, planning, and simulation for its self-driving vehicles. The conversation explores the evolution of Way

Dynamic Token Merging for Efficient Byte-level Language Models with Julie Kallini - #724 Mar 24, 2025 3032 Today, we're joined by Julie Kallini, PhD student at Stanford University to discuss her recent papers, “MrT5: Dynamic Token Merging for Efficient Byte-level Language Models” and “Mission: Impossible Language Models.” For the MrT5 paper, we explore the importance and failings of tokenization in large language models—including inefficient compression rates for under-resourced languages—and dig into

Scaling Up Test-Time Compute with Latent Reasoning with Jonas Geiping - #723 Mar 17, 2025 3518 Today, we're joined by Jonas Geiping, research group leader at Ellis Institute and the Max Planck Institute for Intelligent Systems to discuss his recent paper, “Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach.” This paper proposes a novel language model architecture which uses recurrent depth to enable “thinking in latent space.” We dig into “internal reasoning” ver

Imagine while Reasoning in Space: Multimodal Visualization-of-Thought with Chengzu Li - #722 Mar 10, 2025 2531 Today, we're joined by Chengzu Li, PhD student at the University of Cambridge to discuss his recent paper, “Imagine while Reasoning in Space: Multimodal Visualization-of-Thought.” We explore the motivations behind MVoT, its connection to prior work like TopViewRS, and its relation to cognitive science principles such as dual coding theory. We dig into the MVoT framework along with its various task

Inside s1: An o1-Style Reasoning Model That Cost Under $50 to Train with Niklas Muennighoff - #721 Mar 3, 2025 2969 Today, we're joined by Niklas Muennighoff, a PhD student at Stanford University, to discuss his paper, “S1: Simple Test-Time Scaling.” We explore the motivations behind S1, as well as how it compares to OpenAI's O1 and DeepSeek's R1 models. We dig into the different approaches to test-time scaling, including parallel and sequential scaling, as well as S1’s data curation process, its training recip

Accelerating AI Training and Inference with AWS Trainium2 with Ron Diamant - #720 Feb 24, 2025 4025 Today, we're joined by Ron Diamant, chief architect for Trainium at Amazon Web Services, to discuss hardware acceleration for generative AI and the design and role of the recently released Trainium2 chip. We explore the architectural differences between Trainium and GPUs, highlighting its systolic array-based compute design, and how it balances performance across key dimensions like compute, memor

π0: A Foundation Model for Robotics with Sergey Levine - #719 Feb 18, 2025 3150 Today, we're joined by Sergey Levine, associate professor at UC Berkeley and co-founder of Physical Intelligence, to discuss π0 (pi-zero), a general-purpose robotic foundation model. We dig into the model architecture, which pairs a vision language model (VLM) with a diffusion-based action expert, and the model training "recipe," emphasizing the roles of pre-training and post-training with a diver

AI Trends 2025: AI Agents and Multi-Agent Systems with Victor Dibia - #718 Feb 10, 2025 6299 Today we’re joined by Victor Dibia, principal research software engineer at Microsoft Research, to explore the key trends and advancements in AI agents and multi-agent systems shaping 2025 and beyond. In this episode, we discuss the unique abilities that set AI agents apart from traditional software systems–reasoning, acting, communicating, and adapting. We also examine the rise of agentic foundat

Speculative Decoding and Efficient LLM Inference with Chris Lott - #717 Feb 4, 2025 4590 Today, we're joined by Chris Lott, senior director of engineering at Qualcomm AI Research to discuss accelerating large language model inference. We explore the challenges presented by the LLM encoding and decoding (aka generation) and how these interact with various hardware constraints such as FLOPS, memory footprint and memory bandwidth to limit key inference metrics such as time-to-first-token

Ensuring Privacy for Any LLM with Patricia Thaine - #716 Jan 28, 2025 3093 Today, we're joined by Patricia Thaine, co-founder and CEO of Private AI to discuss techniques for ensuring privacy, data minimization, and compliance when using 3rd-party large language models (LLMs) and other AI services. We explore the risks of data leakage from LLMs and embeddings, the complexities of identifying and redacting personal information across various data flows, and the approach Pr

AI Engineering Pitfalls with Chip Huyen - #715 Jan 21, 2025 3457 Today, we're joined by Chip Huyen, independent researcher and writer to discuss her new book, “AI Engineering.” We dig into the definition of AI engineering, its key differences from traditional machine learning engineering, the common pitfalls encountered in engineering AI systems, and strategies to overcome them. We also explore how Chip defines AI agents, their current limitations and capabilit

Evolving MLOps Platforms for Generative AI and Agents with Abhijit Bose - #714 Jan 13, 2025 3488 Today, we're joined by Abhijit Bose, head of enterprise AI and ML platforms at Capital One to discuss the evolution of the company’s approach and insights on Generative AI and platform best practices. In this episode, we dig into the company’s platform-centric approach to AI, and how they’ve been evolving their existing MLOps and data platforms to support the new challenges and opportunities prese

Why Agents Are Stupid & What We Can Do About It with Dan Jeffries - #713 Dec 16, 2024 4129 Today, we're joined by Dan Jeffries, founder and CEO of Kentauros AI to discuss the challenges currently faced by those developing advanced AI agents. We dig into how Dan defines agents and distinguishes them from other similar uses of LLM, explore various use cases for them, and dig into ways to create smarter agentic systems. Dan shared his “big brain, little brain, tool brain” approach to tackl

Automated Reasoning to Prevent LLM Hallucination with Byron Cook - #712 Dec 9, 2024 3408 Today, we're joined by Byron Cook, VP and distinguished scientist in the Automated Reasoning Group at AWS to dig into the underlying technology behind the newly announced Automated Reasoning Checks feature of Amazon Bedrock Guardrails. Automated Reasoning Checks uses mathematical proofs to help LLM users safeguard against hallucinations. We explore recent advancements in the field of automated rea

AI at the Edge: Qualcomm AI Research at NeurIPS 2024 with Arash Behboodi - #711 Dec 3, 2024 3287 Today, we're joined by Arash Behboodi, director of engineering at Qualcomm AI Research to discuss the papers and workshops Qualcomm will be presenting at this year’s NeurIPS conference. We dig into the challenges and opportunities presented by differentiable simulation in wireless systems, the sciences, and beyond. We also explore recent work that ties conformal prediction to information theory, y

AI for Network Management with Shirley Wu - #710 Nov 19, 2024 3224 Today, we're joined by Shirley Wu, senior director of software engineering at Juniper Networks to discuss how machine learning and artificial intelligence are transforming network management. We explore various use cases where AI and ML are applied to enhance the quality, performance, and efficiency of networks across Juniper’s customers, including diagnosing cable degradation, proactive monitorin

Why Your RAG System Is Broken, and How to Fix It with Jason Liu - #709 Nov 11, 2024 3483 Today, we're joined by Jason Liu, freelance AI consultant, advisor, and creator of the Instructor library to discuss all things retrieval-augmented generation (RAG). We dig into the tactical and strategic challenges companies face with their RAG system, the different signs Jason looks for to identify looming problems, the issues he most commonly encounters, and the steps he takes to diagnose these

An Agentic Mixture of Experts for DevOps with Sunil Mallya - #708 Nov 4, 2024 4509 Today we're joined by Sunil Mallya, CTO and co-founder of Flip AI. We discuss Flip’s incident debugging system for DevOps, which was built using a custom mixture of experts (MoE) large language model (LLM) trained on a novel "CoMELT" observability dataset which combines traditional MELT data—metrics, events, logs, and traces—with code to efficiently identify root failure causes in complex software

Building AI Voice Agents with Scott Stephenson - #707 Oct 28, 2024 3704 Today, we're joined by Scott Stephenson, co-founder and CEO of Deepgram to discuss voice AI agents. We explore the importance of perception, understanding, and interaction and how these key components work together in building intelligent AI voice agents. We discuss the role of multimodal LLMs as well as speech-to-text and text-to-speech models in building AI voice agents, and dig into the benefit

Is Artificial Superintelligence Imminent? with Tim Rocktäschel - #706 Oct 21, 2024 3352 Today, we're joined by Tim Rocktäschel, senior staff research scientist at Google DeepMind, professor of Artificial Intelligence at University College London, and author of the recently published popular science book, “Artificial Intelligence: 10 Things You Should Know.” We dig into the attainability of artificial superintelligence and the path to achieving generalized superhuman capabilities acro

ML Models for Safety-Critical Systems with Lucas García - #705 Oct 14, 2024 4566 Today, we're joined by Lucas García, principal product manager for deep learning at MathWorks to discuss incorporating ML models into safety-critical systems. We begin by exploring the critical role of verification and validation (V&V) in these applications. We review the popular V-model for engineering critical systems and then dig into the “W” adaptation that’s been proposed for incorporating ML

AI Agents: Substance or Snake Oil with Arvind Narayanan - #704 Oct 7, 2024 3262 Today, we're joined by Arvind Narayanan, professor of Computer Science at Princeton University to discuss his recent works, AI Agents That Matter and AI Snake Oil. In “AI Agents That Matter”, we explore the range of agentic behaviors, the challenges in benchmarking agents, and the ‘capability and reliability gap’, which creates risks when deploying AI agents in real-world applications. We also dis

AI Agents for Data Analysis with Shreya Shankar - #703 Sep 30, 2024 2904 Today, we're joined by Shreya Shankar, a PhD student at UC Berkeley to discuss DocETL, a declarative system for building and optimizing LLM-powered data processing pipelines for large-scale and complex document analysis tasks. We explore how DocETL's optimizer architecture works, the intricacies of building agentic systems for data processing, the current landscape of benchmarks for data processin

Stealing Part of a Production Language Model with Nicholas Carlini - #702 Sep 23, 2024 3810 Today, we're joined by Nicholas Carlini, research scientist at Google DeepMind to discuss adversarial machine learning and model security, focusing on his 2024 ICML best paper winner, “Stealing part of a production language model.” We dig into this work, which demonstrated the ability to successfully steal the last layer of production language models including ChatGPT and PaLM-2. Nicholas shares t

Supercharging Developer Productivity with ChatGPT and Claude with Simon Willison - #701 Sep 16, 2024 4455 Today, we're joined by Simon Willison, independent researcher and creator of Datasette to discuss the many ways software developers and engineers can take advantage of large language models (LLMs) to boost their productivity. We dig into Simon’s own workflows and how he uses popular models like ChatGPT and Anthropic’s Claude to write and test hundreds of lines of code while out walking his dog. We

Automated Design of Agentic Systems with Shengran Hu - #700 Sep 2, 2024 3570 Today, we're joined by Shengran Hu, a PhD student at the University of British Columbia, to discuss Automated Design of Agentic Systems (ADAS), an approach focused on automatically creating agentic system designs. We explore the spectrum of agentic behaviors, the motivation for learning all aspects of agentic system design, the key components of the ADAS approach, and how it uses LLMs to design no

The EU AI Act and Mitigating Bias in Automated Decisioning with Peter van der Putten - #699 Aug 27, 2024 2734 Today, we're joined by Peter van der Putten, director of the AI Lab at Pega and assistant professor of AI at Leiden University. We discuss the newly adopted European AI Act and the challenges of applying academic fairness metrics in real-world AI applications. We dig into the key ethical principles behind the Act, its broad definition of AI, and how it categorizes various AI risks. We also discuss

The Building Blocks of Agentic Systems with Harrison Chase - #698 Aug 19, 2024 3557 Today, we're joined by Harrison Chase, co-founder and CEO of LangChain to discuss LLM frameworks, agentic systems, RAG, evaluation, and more. We dig into the elements of a modern LLM framework, including the most productive developer experiences and appropriate levels of abstraction. We dive into agents and agentic systems as well, covering the “spectrum of agenticness,” cognitive architectures, a

Simplifying On-Device AI for Developers with Siddhika Nevrekar - #697 Aug 12, 2024 2797 Today, we're joined by Siddhika Nevrekar, AI Hub head at Qualcomm Technologies, to discuss on-device AI and how to make it easier for developers to take advantage of device capabilities. We unpack the motivations for AI engineers to move model inference from the cloud to local devices, and explore the challenges associated with on-device AI. We dig into the role of hardware solutions, from powerfu

Genie: Generative Interactive Environments with Ashley Edwards - #696 Aug 5, 2024 2811 Today, we're joined by Ashley Edwards, a member of technical staff at Runway, to discuss Genie: Generative Interactive Environments, a system for creating ‘playable’ video environments for training deep reinforcement learning (RL) agents at scale in a completely unsupervised manner. We explore the motivations behind Genie, the challenges of data acquisition for RL, and Genie’s capability to learn

Bridging the Sim2real Gap in Robotics with Marius Memmel - #695 Jul 30, 2024 3441 Today, we're joined by Marius Memmel, a PhD student at the University of Washington, to discuss his research on sim-to-real transfer approaches for developing autonomous robotic agents in unstructured environments. Our conversation focuses on his recent ASID and URDFormer papers. We explore the complexities presented by real-world settings like a cluttered kitchen, data acquisition challenges for

Building Real-World LLM Products with Fine-Tuning and More with Hamel Husain - #694 Jul 23, 2024 4805 Today, we're joined by Hamel Husain, founder of Parlance Labs, to discuss the ins and outs of building real-world products using large language models (LLMs). We kick things off discussing novel applications of LLMs and how to think about modern AI user experiences. We then dig into the key challenge faced by LLM developers—how to iterate from a snazzy demo or proof-of-concept to a working LLM-bas

Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - #693 Jul 17, 2024 3474 Today, we're joined by Albert Gu, assistant professor at Carnegie Mellon University, to discuss his research on post-transformer architectures for multi-modal foundation models, with a focus on state-space models in general and Albert’s recent Mamba and Mamba-2 papers in particular. We dig into the efficiency of the attention mechanism and its limitations in handling high-resolution perceptual mod

Decoding Animal Behavior to Train Robots with EgoPet with Amir Bar - #692 Jul 9, 2024 2596 Today, we're joined by Amir Bar, a PhD candidate at Tel Aviv University and UC Berkeley to discuss his research on visual-based learning, including his recent paper, “EgoPet: Egomotion and Interaction Data from an Animal’s Perspective.” Amir shares his research projects focused on self-supervised object detection and analogy reasoning for general computer vision tasks. We also discuss the current

The TWIML AI Podcast (formerly This Week in Machine Learning & Artificial Intelligence)

Episodes

Recommended