
The Information Bottleneck
Two AI researchers, Ravid Shwartz-Ziv and Allen Roush, discuss the latest trends, news, and research in Generative AI, LLMs, GPUs, and Cloud Systems. The podcast covers cutting-edge developments in artificial intelligence and machine learning, offering insights from experts in the field.
Episodes
Jürgen Schmidhuber - Part 2: JEPA, the Road to AGI, and Who Really Invented Modern AI
In the second half of our conversation with Jürgen Schmidhuber, we focus on the key ideas he's pursued since the early 1990s and discuss why he believes these concepts are only now being rediscovered.We start with JEPA. Jürgen argues that the method LeCun named in 2022 is the same family he published in 1992 as Predictability Maximization. From there he traces the adversarial lineage back further
Jürgen Schmidhuber - World Models, RL, and the Year that changed AI (Part 1)
In this episode, we host Jürgen Schmidhuber - the man, the legend, one of the godfathers of modern AI. His lab worked out many ideas behind today’s systems (LSTM, world models, artificial curiosity, Transformer variants, and even GAN-style setups) decades before they became fashionable, and he’s just as well known for making sure people remember who did what first. This is the first of two conve
AI for Science and the Thermodynamics of Generative AI - with Max Welling (UvA, CuspAI)
In this episode, we sit with Max Welling, Professor of Machine Learning at the University of Amsterdam, co-founder and CTO of CuspAI, and a foundational figure behind variational autoencoders (VAEs), equivariant networks, and Bayesian deep learning. We talk about AI for science, the physics underneath generative models, and what's still missing on the road to real intelligence.Max starts with what
After Math Falls, What's Next? with Julia Kempe (NYU/Meta)
Julia Kempe on Why Math Will Fall Next, Superhuman Provers, and the Return of the Renaissance ResearcherIn this episode, we sit down with Julia Kempe, a Professor at NYU's Center for Data Science and researcher at Meta FAIR's Foundations of Reasoning team, for a wide-ranging conversation on the future of AI research.We dig into why verifiable domains like mathematics may be on track to "fall" the
Language, Cognition, and the Limits of LLMs - with Tal Linzen (NYU/Google)
We host Tal Linzen, Associate Professor at NYU and Research Scientist at Google, for a conversation on the intersection of cognitive science and large language models.We discussed why children can learn language from around 100 million words while LLMs need trillions, and the surprising finding that as models get better at predicting the next word, they become worse models of how humans actually p
Intelligence in an Open World - with Mengye Ren (NYU)
We talk with Mengye Ren, Assistant Professor at NYU's Center for Data Science, about what intelligence actually means once you step outside a benchmark, and why scaling a single centralized model isn't the whole story.We get into why intelligence has to be defined in open environments, not closed ones, and what that means for how we measure progress. We push on the creativity question: today's mod
The Principles of Diffusion Models - with Jesse Lai (Sony AI)
We host Chieh-Hsin (Jesse) Lai, Staff Research Scientist at Sony AI and visiting professor at National Yang Ming Chiao Tung University, Taiwan, for a conversation about diffusion models, the technology behind tools like Stable Diffusion, and most of the AI image and video generators you've seen in the last few years. Jesse recently co-authored The Principles of Diffusion Models with Stefano Ermon,
Inside xAI, and the Bet on AI Math - with Christian Szegedy (Math Inc)
We talked with Christian Szegedy, co-inventor of Inception and Batch Normalization, founding scientist at xAI, now at Math Inc, about what it takes to build a frontier lab, and why he left xAI to work on formal mathematics. Christian thinks Lean and auto-formalization are the missing piece for trustworthy AI: a machine-checkable layer underneath all reasoning, where proofs are guaranteed correct w
Reasoning Models and Planning - with Rao Kambhampati (Arizona State)
We sat down with Rao Kambhampati, a Professor of CS at Arizona State University and former President of AAAI, to talk about reasoning models: what they are, when they work, and when they break.Rao has been working on planning and decision-making since long before deep learning, which makes him one of the most grounded voices on what today's reasoning systems actually do. We start with definitions
What Actually Matters in AI? - with Zhuang Liu (Princeton)
In this episode, we hosted Zhuang Liu, Assistant Professor at Princeton and former researcher at Meta, for a conversation about what actually matters in modern AI and what turns out to be a historical accident.Zhuang is behind some of the most important papers in recent years (with more than 100k citations): ConvNeXt (showing ConvNets can match Transformers if you get the details right), Transform
The Future of Coding Agents with Sasha Rush (Cursor/Cornell)
We talked with Sasha Rush, researcher at Cursor and professor at Cornell, about what it actually feels like to we in the heart of the AI revolution and build coding agents right now. Sasha shared how these systems are changing day-to-day work and how it feels to develop these systems.A big part of the conversation was about why coding has become such a powerful setting for these tools. We discusse
The Hidden Engine of Vision with Peyman Milanfar (Google)
How Denoising Secretly Powers Everything in AIPeyman Milanfar is a Distinguished Scientist at Google, leading its Computational Imaging team. He's a member of the National Academy of Engineering, an IEEE Fellow, and one of the key people behind the Pixel camera pipeline. Before Google, he was a professor at UC Santa Cruz for 15 years and helped build the imaging pipeline for Google Glass at Google
Reinventing AI From Scratch with Yaroslav Bulatov
Yaroslav Bulatov helped build the AI era from the inside, as one of the earliest researchers at both OpenAI and Google Brain. Now he wants to tear it all down and start over. Modern deep learning, he argues, is up to 100x more wasteful than it needs to be - a Frankenstein of hacks designed for the wrong hardware. With a power wall approaching in two years, Yaroslav is leading an open effort to r
Why Healthcare Is AI's Hardest and Most Important Problem with Kyunghyun Cho (NYU)
We talk with Kyunghyun Cho, who is a Professor of Health Statistics and a Professor of Computer Science and Data Science at New York University, and a former Executive Director at Genentech, about why healthcare might be the most important and most difficult domain for AI to transform. Kyunghyun shares his vision for a future where patients own their own medical records, proposes a provocative ide
Diffusion LLM & Why the Future of AI Won't Be Autoregressive - Stefano Ermon (Stanford /Inception)
In this episode, we talk with Stefano Ermon, Stanford professor, co-founder & CEO of Inception AI, and co-inventor of DDIM, FlashAttention, DPO, and score-based/diffusion models, about why diffusion-based language models may overtake the autoregressive paradigm that dominates today's LLMs.We start with the fundamental topics, such as what diffusion models actually are, and why iterative refin
Training Is Nothing Like Learning with Naomi Saphra (Harvard)
Naomi Saphra, Kempner Research Fellow at Harvard and incoming Assistant Professor at Boston University, joins us to explain why you can't do interpretability without understanding training dynamics, in the same way you can't do biology without evolution.Naomi argues that many structures researchers find inside trained models are vestigial, they mattered early in training but are meaningless by th
EP28: How to Control a Stochastic Agent with Stefano Soatto (VP AWS/ Pro. UCLA)
Stefano Soatto, VP for AI at AWS and Professor at UCLA, the person responsible for agentic AI at AWS, joins us to explain why building reliable AI agents is fundamentally a control theory problem.Stefano sees LLMs as stochastic dynamical systems that need to be controlled, not just prompted. He introduces "strands coding," a new framework AWS is building that sits between vibe coding and spec cod
EP27: Medical Foundation Models - with Tanishq Abraham (Sophont.AI)
Tanishq Abraham, CEO and co-founder of Sophont.ai, joins us to talk about building foundation models specifically for medicine.Sophont is trying to be something like an OpenAI or Anthropic but for healthcare - training models across pathology, neuroimaging, and clinical text, to eventually fuse them into one multimodal system. The surprising part: their pathology model trained on 12,000 public sl
EP26: Measuring Intelligence in the Wild - Arena and the Future of AI Evaluation
Anastasios Angelopoulos, Co-Founder and CEO of Arena AI (formerly LMArena), joins us to talk about why static benchmarks are failing, how human preference data actually works under the hood, and what it takes to be the "gold standard" of AI evaluation.Anastasios sits at a fascinating intersection - a theoretical statistician running the platform that every major lab watches when they release a m
EP25: Personalization, Data, and the Chaos of Fine-Tuning with Fred Sala (UW-Madison / Snorkel AI)
Fred Sala, Assistant Professor at UW-Madison and Chief Scientist at Snorkel AI, joins us to talk about why personalization might be the next frontier for LLMs, why data still matters more than architecture, and how weak supervision refuses to die.Fred sits at a rare intersection, building the theory of data-centric AI in academia while shipping it to enterprise clients at Snorkel. We talk about t
EP24: Can AI Learn to Think About Money? - with Bayan Bruss (Capital One)
Bayan Bruss, VP of Applied AI at Capital One, joins us to talk about building AI systems that can make autonomous financial decisions, and why money might be the hardest problem in machine learning.Bayan leads Capital One's AI Foundations team, where they're working toward a destination most people don't associate with banking: getting AI systems to perceive financial ecosystems, form beliefs abou
EP23: Building Open Source AI Frameworks: David Mezzetti on TxtAI and Local-First AI
David Mezzetti, creator of TxtAI, joins us to talk about building open source AI frameworks as a solo developer - and why local-first AI still matters in the age of API-everything.David's path from running a 50-person IT company through acquisition to building one of the most well-regarded AI orchestration libraries tells you how sometimes constraints breed better design. TextAI started during COV
EP22: Data Curation for LLMs with Cody Blakeney (Datology AI)
Cody Blakeney from Datology AI joins us to talk about data curation - the unglamorous but critical work of figuring out what to actually train models on.Cody's path from writing CUDA kernels to spending his days staring at weird internet text tells you something important: data quality can account for half or more of a model's final performance. That's on par with major architectural breakthroughs
EP21: Privacy in the Age of Agents with Niloofar Mireshghallah
Guest: Niloofar Mireshghallah (Incoming Assistant Professor at CMU, Member of Technical Staff at Humans and AI)In this episode, we dive into AI privacy, frontier model capabilities, and why academia still matters.We kick off by discussing GPT-5.2 and whether models rely more on parametric knowledge or context. Niloofar shares how reasoning models actually defer to context, even accepting obviously
EP20: Yann LeCun
Yann LeCun – Why LLMs Will Never Get Us to AGI"The path to superintelligence - just train up the LLMs, train on more synthetic data, hire thousands of people to school your system in post-training, invent new tweaks on RL-I think is complete bullshit. It's just never going to work."After 12 years at Meta, Turing Award winner Yann LeCun is betting his legacy on a radically different vision of AI. I
EP19: AI in Finance and Symbolic AI with Atlas Wang
Atlas Wang (UT Austin faculty, XTX Research Director) joins us to explore two fascinating frontiers: the foundations of symbolic AI and the practical challenges of building AI systems for quantitative finance.On the symbolic AI side, Atlas shares his recent work proving that neural networks can learn symbolic equations through gradient descent, a surprising result given that gradient descent is co
EP18: AI Robotics
In this episode, we hosted Judah Goldfeder, a PhD candidate at Columbia University and student researcher at Google, to discuss robotics, reproducibility in ML, and smart buildings.Key topics covered:Robotics challenges: We discussed why robotics remains harder than many expected, compared to LLMs. The real world is unpredictable and unforgiving, and mistakes have physical consequences. Sim-to-rea
EP17: RL with Will Brown
In this episode, we talk with Will Brown, a research lead at Prime Intellect, about his journey into reinforcement learning (RL) and multi-agent systems, exploring their theoretical foundations and practical applications. We discuss the importance of RL in the current LLMs pipeline and the challenges it faces. We also discuss applying agentic workflows to real-world applications and the ongoing ev
EP16: AI News and Papers
In this episode, we discuss various topics in AI, including the challenges of the conference review process, the capabilities of Kimi K2 thinking, the advancements in TPU technology, the significance of real-world data in robotics, and recent innovations in AI research. We also talk about the cool "Chain of Thought Hijacking" paper, how to use simple ideas to scale RL, and the implications of the
EP15: The Information Bottleneck and Scaling Laws with Alex Alemi
In this episode, we sit down with Alex Alemi, an AI researcher at Anthropic (previously at Google Brain and Disney), to explore the powerful framework of the information bottleneck and its profound implications for modern machine learning.We break down what the information bottleneck really means, a principled approach to retaining only the most informative parts of data while compressing away the
EP14: AI News and Papers
In this episode, we talked about AI news and recent papers. We explored the complexities of using AI models in healthcare (the Nature Medicine paper on GPT-5's fragile intelligence in medical contexts). We discussed the delicate balance between leveraging LLMs as powerful research tools and the risks of over-reliance, touching on issues such as hallucinations, medical disagreements among practitio
EP13: Recurrent-Depth Models and Latent Reasoning with Jonas Geiping
In this episode, we host Jonas Geiping from ELLIS Institute & Max-Planck Institute for Intelligent Systems, Tübingen AI Center, Germany. We talked about his broad research on Recurrent-Depth Models and latent reasoning in large language models (LLMs). We talked about what these models can and can't do, what are the challenges and next breakthroughs in the field, world models, and the future of
EP12: Adversarial attacks and compression with Jack Morris
In this episode of the Information Bottleneck Podcast, we host Jack Morris, a PhD student at Cornell, to discuss adversarial examples (Jack created TextAttack, the first software package for LLM jailbreaking), the Platonic representation hypothesis, the implications of inversion techniques, and the role of compression in language models.Links:Jack's Website - https://jxmo.io/TextAttack - https://a
EP11: JEPA with Randall Balestriero
In this episode we talk with Randall Balestriero, an assistant professor at Brown University. We discuss the potential and challenges of Joint Embedding Predictive Architectures (JEPA). We explore the concept of JEPA, which aims to learn good data representations without reconstruction-based learning. We talk about the importance of understanding and compressing irrelevant details, the role of pre
EP10: Geometric Deep Learning with Michael Bronstein
In this episode, we talked with Michael Bronstein, a professor of AI at the University of Oxford and a scientific director at AITHYRA, about the fascinating world of geometric deep learning. We explored how understanding the geometric structures in data can enhance the efficiency and accuracy of AI models. Michael shared insights on the limitations of small neural networks and the ongoing debate a
EP9: AI in Natural Sciences with Tal Kachman
In this episode we host Tal Kachman, an assistant professor at Radboud University, to explore the fascinating intersection of artificial intelligence and natural sciences. Prof. Kachman's research focuses on multiagent interaction, complex systems, and reinforcement learning. We dive deep into how AI is revolutionizing materials discovery, chemical dynamics modeling, and experimental design throug
EP8: RL with Ahmad Beirami
In this episode, we talked with Ahmad Beirami, an ex-researcher at Google, to discuss various topics. We explored the complexities of reinforcement learning, its applications in LLMs, and the evaluation challenges in AI research. We also discussed the dynamics of academic conferences and the broken review system. Finally, we discussed how to integrate theory and practice in AI research and why the
EP7: AI and Neuroscience with Aran Nayebi
In this episode of the "Information Bottleneck" podcast, we hosted Aran Nayeb, an assistant professor at Carnegie Mellon University, to discuss the intersection of computational neuroscience and machine learning. We talked about the challenges and opportunities in understanding intelligence through the lens of both biological and artificial systems. We talked about topics such as the evolution of
EP6: Urban Design Meets AI: With Ariel Noyman
We talked with Ariel Noyman, an urban scientist, working in the intersection of cities and technology. Ariel is a research scientist at the MIT Media Lab, exploring novel methods of urban modeling and simulation using AI. We discussed the potential of virtual environments to enhance urban design processes, the challenges associated with them, and the future of utilizing AI. Links:TravelAgent: Gene
EP5: Speculative Decoding with Nadav Timor
We discussed the inference optimization technique known as Speculative Decoding with a world class researcher, expert, and ex-coworker of the podcast hosts: Nadav Timor.Papers and links:Accelerating LLM Inference with Lossless Speculative Decoding Algorithms for Heterogeneous Vocabularies, Timor et al, ICML 2025, https://arxiv.org/abs/2502.05202Distributed Speculative Inference (DSI): Speculation
EP4: AI Coding
In this episode, Ravid and Allen discuss the evolving landscape of AI coding. They explore the rise of AI-assisted development tools, the challenges faced in software engineering, and the potential future of AI in creative fields. The conversation highlights both the benefits and limitations of AI in coding, emphasizing the need for careful consideration of its impact on the industry and society.C
EP3: GPU Cloud
Allen and Ravid discuss the dynamics associated with the extreme need for GPUs that AI researchers utilize. They also discuss the latest advancements in AI, including Google's Nano Banana and DeepSeek V3.1, exploring the implications of synthetic data, perplexity, and the influence of AI on human communication. They also delve into the challenges faced by AI researchers in the job market, the impo
EP2: PeFT
Allen and Ravid sit down and talk about Parameter Efficient Fine Tuning (PeFT) along with the latest updated in AI/ML news.
EP1: Sampling
Allen and Ravid discuss a topic near and dear to their hearts, LLM Sampling!In this episode of the Information Bottleneck Podcast, Ravid Shwartz-Ziv and Alan Rausch discuss the latest developments in AI, focusing on the controversial release of GPT-5 and its implications for users. They explore the future of large language models and the importance of sampling techniques in AI. Chapters00:00 Intro











