Linear Digressions

Agent Economics (The Agents Season, Episode 10) Jun 22, 2026 00:24:24 What if building more highways made your commute *slower*? That's the paradox at the heart of AI agent economics: even as per-token inference costs have plummeted dramatically over the past two years, total LLM spending keeps climbing. Drawing on a surprising lesson from Robert Moses's mid-century New York infrastructure projects, this episode unpacks why cheaper compute doesn't necessarily mean c

Agent Trust, Oversight and Control (The Agents Season, Episode 9) Jun 15, 2026 00:25:41 Capabilities get all the attention when it comes to AI agents — but what happens when a highly capable agent makes a bad decision in the real world? Trust, oversight, and control are the unglamorous but critically important flip side of the agentic AI story. This episode digs into the security concerns that emerge when you combine powerful models with real-world tool access, and why judgment (or t

Many Agents, Many Problems (The Agents Season, Episode 8) Jun 8, 2026 00:28:26 Whether you work best solo or thrive in a team, you know collaboration is complicated — and it turns out AI agents face the same tensions. This episode dives into multi-agent systems, exploring how networks of AI agents can overcome the individual limitations of a single model, and what the research says about when collaboration actually helps versus when it just adds noise. Think scaling laws, bu

How Do You Evaluate An AI Agent? (The Agents Season, Episode 7) Jun 1, 2026 00:31:45 Knowing when an AI agent has failed sounds straightforward — until it isn't. Agents have a frustrating habit of finishing confidently while quietly doing the wrong thing, or looping endlessly without ever crashing in an obvious way. This episode tackles one of the thorniest problems in the agentic world: evaluation. If failure is hard to see, how do you measure it systematically? And how do you kn

AI Agent Failure Modes (The Agents Season, Episode 6) May 25, 2026 00:32:42 Despite what the marketing hype might suggest, AI agents are far from infallible — and if you've ever actually used one, you already know this. Today's episode dives deep into the many, varied, and sometimes surprising ways AI agents can fail, from subtle reasoning errors to cascading task breakdowns. It's episode six in the show's ongoing season arc on AI agents, and failure modes turn out to be

Agentic Planning (The Agents Season, Episode 5) May 18, 2026 00:24:00 When tackling a complex, multi-step task, even the smartest AI agent can fail without a solid game plan. This episode dives into the research around agentic planning — how agents move beyond simply reacting to what's in front of them and instead model a path forward, explore different routes, and course-correct when things go sideways. It's a subtler problem than memory, and a fascinating one: can

Memory Management for AI Agents (The Agents Season, Episode 4) May 10, 2026 00:24:41 Context windows are powerful — but finite, and surprisingly easy to overwhelm. When an AI agent is tackling a long, complex task, the information it needs has to fit inside that limited real estate, and research shows that anything buried in the middle tends to quietly disappear. So how do you design a system that actually *remembers* what matters? This episode digs into memory management for AI a

Lost in the Middle (The Agents Season, Episode 3) May 4, 2026 00:19:44 Just like a memorable talk lives or dies by its opening and closing, LLMs have a surprisingly similar quirk: they pay close attention to what's at the beginning and end of their context window — and kind of zone out in the middle. This "lost in the middle" phenomenon has real consequences for anyone building AI agents that rely on long-context reasoning. In this episode we dig into the research be

ReAct and Tool Usage (The Agents Season, Episode 2) Apr 27, 2026 00:23:41 Before 2022, there was a wall between AI and the real world — models could reason impressively, but couldn't look anything up, run code, or check whether anything they said was actually true. This episode traces the moment that wall came down, through two landmark papers: ReAct, which showed what happens when you interleave reasoning and action in a loop, and Toolformer, which taught models to dec

What's an AI Agent? And Why's That Hard to Define? (The Agents Season, Episode 1) Apr 20, 2026 00:19:03 AI agents are having a moment — and unpacking them properly takes more than a single conversation. This episode kicks off a dedicated multi-part season exploring AI agents from every angle, building up a complete picture piece by piece rather than skimming the surface. Think of it as a structured deep dive into one of the most talked-about (and most misunderstood) topics in machine learning right

Unfaithful Chain of Thought Apr 13, 2026 00:24:32 What's actually happening when an LLM "thinks out loud"? Research on human decision-making suggests that much of the reasoning we believe drives our choices is actually post hoc rationalization — we decide first, explain later. Katie and Ben get curious about whether the same might be true for large language models: when you watch a model reason through a problem in real time, is that chain of tho

Benchmark Bank Heist Apr 6, 2026 00:12:36 What if an AI decided the smartest way to pass its test was to find the answer key? That's exactly what Anthropic's Claude Opus did when faced with a benchmark evaluation — reasoning that it was being tested, tracking down the encrypted eval dataset, decrypting it, and returning the answer it found inside. It's equal parts impressive and unsettling. This episode digs into what actually happened, w

Benchmarking AI Models Mar 30, 2026 00:29:55 How do you know if a new AI model is actually better than the last one? It turns out answering that question is a lot messier than it sounds. This week we dig into the world of LLM benchmarks — the standardized tests used to compare models — exploring two canonical examples: MMLU, a 14,000-question multiple choice gauntlet spanning medicine, law, and philosophy, and SWE-bench, which throws real Gi

The Hot Mess of AI (Mis-)Alignment Mar 23, 2026 00:22:32 The paperclip maximizer — the classic AI doom scenario where a hyper-competent machine single-mindedly converts the universe into office supplies — might not be the AI risk we should actually lose sleep over. New research from Anthropic's AI safety division suggests misaligned AI looks less like an evil genius and more like a distracted wanderer who gets sidetracked reading French poetry instead o

The Bitter Lesson Mar 15, 2026 00:19:17 Every AI builder knows the anxiety: you spend months engineering prompts, tuning pipelines, and chaining calls together — then a new model drops and half your work evaporates overnight. It turns out researchers have been wrestling with this exact dynamic for 30 years, and they keep arriving at the same uncomfortable answer. That answer is called the Bitter Lesson — and understanding it might be th

From Atari to ChatGPT: How AI Learned to Follow Instructions Mar 9, 2026 00:25:53 From Atari to ChatGPT: How AI Learned to Follow Instructions by Katie Malone

It's RAG time: Retrieval-Augmented Generation Mar 2, 2026 00:17:14 Today we are going to talk about the feature with the worst acronym in generative AI: RAG, or Retrieval Augmented Generation. If you've ever used something like "Chat with My Docs," if you have an internal AI chatbot that has access to your company's documents, or you've created one yourself on some kind of personal project and uploaded a bunch of documents for the AI to use — you have encountered

Chasing Away Repetitive LLM Responses with Verbalized Sampling Feb 23, 2026 00:19:12 One of the things that LLMs can be really helpful with is brainstorming or generating new creative content. They are called Generative AI, after all—not just for summarization and question-and-answer tasks. But if you use LLMs for creative generation, you may find that their output starts to seem repetitive after a little while. Let's say you're asking it to create a poem, some dialogue, or a joke

We're Back Feb 16, 2026 00:02:58 It's been (*checks watch*) about five and a half years since we last talked. Fortunately nothing much has happened in the AI/data science world in that time. So let's just pick up where we left off, shall we?

A Key Concept in AI Alignment: Deep Reinforcement Learning from Human Preferences Feb 14, 2026 00:19:13 Modern AI chatbots have a few different things that go into creating them. Today we're going to talk about a really important part of the process: the alignment training, where the chatbot goes from being just a pre-trained model—something that's kind of a fancy autocomplete—to something that really gives responses to human prompts that are more conversational, that are closer to the ones that we

The Impact of Generative AI on Critical Thinking Feb 14, 2026 00:25:33 I use LLMs a lot. I use them in my work, I use them in my personal life, and sometimes I use them to help me with stuff that I already know how to do. I’m working on something and I just want to make it a little bit easier, and it does make it easier for sure. But something that I worry about sometimes is that over the long run, I'm going to pay a price for that. I'm going to get lazier, I'm goin

So long, and thanks for all the fish Jul 26, 2020 00:35:44 All good things must come to an end, including this podcast. This is the last episode we plan to release, and it doesn’t cover data science—it’s mostly reminiscing, thanking our wonderful audience (that’s you!), and marveling at how this thing that started out as a side project grew into a huge part of our lives for over 5 years. It’s been a ride, and a real pleasure and privilege to talk to you

A Reality Check on AI-Driven Medical Assistants Jul 19, 2020 00:14:00 The data science and artificial intelligence community has made amazing strides in the past few years to algorithmically automate portions of the healthcare process. This episode looks at two computer vision algorithms, one that diagnoses diabetic retinopathy and another that classifies liver cancer, and asks the question—are patients now getting better care, and achieving better outcomes, with th

A Data Science Take on Open Policing Data Jul 13, 2020 00:23:44 A few weeks ago, we put out a call for data scientists interested in issues of race and racism, or people studying how those topics can be studied with data science methods, should get in touch to come talk to our audience about their work. This week we’re excited to bring on Todd Hendricks, Bay Area data scientist and a volunteer who reached out to tell us about his studies with the Stanford Open

Procella: YouTube's super-system for analytics data storage Jul 6, 2020 00:29:48 This is a re-release of an episode that originally ran in October 2019. If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs an

The Data Science Open Source Ecosystem Jun 29, 2020 00:23:06 Open source software is ubiquitous throughout data science, and enables the work of nearly every data scientist in some way or another. Open source projects, however, are disproportionately maintained by a small number of individuals, some of whom are institutionally supported, but many of whom do this maintenance on a purely volunteer basis. The health of the data science ecosystem depends on the

Rock the ROC Curve Jun 21, 2020 00:15:52 This is a re-release of an episode that first ran on January 29, 2017. This week: everybody's favorite WWII-era classifier metric! But it's not just for winning wars, it's a fantastic go-to metric for all your classifier quality needs.

Criminology and Data Science Jun 15, 2020 00:30:57 This episode features Zach Drake, a working data scientist and PhD candidate in the Criminology, Law and Society program at George Mason University. Zach specializes in bringing data science methods to studies of criminal behavior, and got in touch after our last episode (about racially complicated recidivism algorithms). Our conversation covers a wide range of topics—common misconceptions around

Racism, the criminal justice system, and data science Jun 7, 2020 00:31:36 As protests sweep across the United States in the wake of the killing of George Floyd by a Minneapolis police officer, we take a moment to dig into one of the ways that data science perpetuates and amplifies racism in the American criminal justice system. COMPAS is an algorithm that claims to give a prediction about the likelihood of an offender to re-offend if released, based on the attributes of

An interstitial word from Ben Jun 5, 2020 00:05:59 A message from Ben around algorithmic bias, and how our models are sometimes reflections of ourselves.

Convolutional Neural Networks May 31, 2020 00:21:55 This is a re-release of an episode that originally aired on April 1, 2018 If you've done image recognition or computer vision tasks with a neural network, you've probably used a convolutional neural net. This episode is all about the architecture and implementation details of convolutional networks, and the tricks that make them so good at image tasks.

Stein's Paradox May 24, 2020 00:27:02 This is a re-release of an episode that was originally released on February 26, 2017. When you're estimating something about some object that's a member of a larger group of similar objects (say, the batting average of a baseball player, who belongs to a baseball team), how should you estimate it: use measurements of the individual, or get some extra information from the group? The James-Ste

Protecting Individual-Level Census Data with Differential Privacy May 18, 2020 00:21:19 The power of finely-grained, individual-level data comes with a drawback: it compromises the privacy of potentially anyone and everyone in the dataset. Even for de-identified datasets, there can be ways to re-identify the records or otherwise figure out sensitive personal information. That problem has motivated the study of differential privacy, a set of techniques and definitions for keeping pers

Causal Trees May 11, 2020 00:15:27 What do you get when you combine the causal inference needs of econometrics with the data-driven methodology of machine learning? Usually these two don’t go well together (deriving causal conclusions from naive data methods leads to biased answers) but economists Susan Athey and Guido Imbens are on the case. This episodes explores their algorithm for recursively partitioning a dataset to find hete

The Grammar Of Graphics May 4, 2020 00:35:38 You may not realize it consciously, but beautiful visualizations have rules. The rules are often implict and manifest themselves as expectations about how the data is summarized, presented, and annotated so you can quickly extract the information in the underlying data using just visual cues. It’s a bit abstract but very profound, and these principles underlie the ggplot2 package in R that makes f

Gaussian Processes Apr 27, 2020 00:20:55 It’s pretty common to fit a function to a dataset when you’re a data scientist. But in many cases, it’s not clear what kind of function might be most appropriate—linear? quadratic? sinusoidal? some combination of these, and perhaps others? Gaussian processes introduce a nonparameteric option where you can fit over all the possible types of functions, using the data points in your datasets as const

Keeping ourselves honest when we work with observational healthcare data Apr 20, 2020 00:19:08 The abundance of data in healthcare, and the value we could capture from structuring and analyzing that data, is a huge opportunity. It also presents huge challenges. One of the biggest challenges is how, exactly, to do that structuring and analysis—data scientists working with this data have hundreds or thousands of small, and sometimes large, decisions to make in their day-to-day analysis work.

Changing our formulation of AI to avoid runaway risks: Interview with Prof. Stuart Russell Apr 13, 2020 00:28:58 AI is evolving incredibly quickly, and thinking now about where it might go next (and how we as a species and a society should be prepared) is critical. Professor Stuart Russell, an AI expert at UC Berkeley, has a formulation for modifications to AI that we should study and try implementing now to keep it much safer in the long run. Prof. Russell’s new book, “Human Compatible: Artificial Intellige

Putting machine learning into a database Apr 6, 2020 00:24:22 Most data scientists bounce back and forth regularly between doing analysis in databases using SQL and building and deploying machine learning pipelines in R or python. But if we think ahead a few years, a few visionary researchers are starting to see a world in which the ML pipelines can actually be deployed inside the database. Why? One strong advantage for databases is they have built-in featur

The work-from-home episode Mar 29, 2020 00:29:06 Many of us have the privilege of working from home right now, in an effort to keep ourselves and our family safe and slow the transmission of covid-19. But working from home is an adjustment for many of us, and can hold some challenges compared to coming in to the office every day. This episode explores this a little bit, informally, as we compare our new work-from-home setups and reflect on what’

Understanding Covid-19 transmission: what the data suggests about how the disease spreads Mar 23, 2020 00:25:25 Covid-19 is turning the world upside down right now. One thing that’s extremely important to understand, in order to fight it as effectively as possible, is how the virus spreads and especially how much of the spread of the disease comes from carriers who are experiencing no or mild symptoms but are contagious anyway. This episode digs into the epidemiological model that was published in Science t

Network effects re-release: when the power of a public health measure lies in widespread adoption Mar 15, 2020 00:26:40 This week’s episode is a re-release of a recent episode, which we don’t usually do but it seems important for understanding what we can all do to slow the spread of covid-19. In brief, public health measures for infectious diseases get most of their effectiveness from their widespread adoption: most of the protection you get from a vaccine, for example, comes from all the other people who also got

Causal inference when you can't experiment: difference-in-differences and synthetic controls Mar 9, 2020 00:20:48 When you need to untangle cause and effect, but you can’t run an experiment, it’s time to get creative. This episode covers difference in differences and synthetic controls, two observational causal inference techniques that researchers have used to understand causality in complex real-world situations.

Better know a distribution: the Poisson distribution Mar 2, 2020 00:31:51 This is a re-release of an episode that originally ran on October 21, 2018. The Poisson distribution is a probability distribution function used to for events that happen in time or space. It’s super handy because it’s pretty simple to use and is applicable for tons of things—there are a lot of interesting processes that boil down to “events that happen in time or space.” This episode is a quick

The Lottery Ticket Hypothesis Feb 23, 2020 00:19:45 Recent research into neural networks reveals that sometimes, not all parts of the neural net are equally responsible for the performance of the network overall. Instead, it seems like (in some neural nets, at least) there are smaller subnetworks present where most of the predictive power resides. The fascinating thing is that, for some of these subnetworks (so-called “winning lottery tickets”), i

Interesting technical issues prompted by GDPR and data privacy concerns Feb 17, 2020 00:20:26 Data privacy is a huge issue right now, after years of consumers and users gaining awareness of just how much of their personal data is out there and how companies are using it. Policies like GDPR are imposing more stringent rules on who can use what data for what purposes, with an end goal of giving consumers more control and privacy around their data. This episode digs into this topic, but not f

Thinking of data science initiatives as innovation initiatives Feb 10, 2020 00:17:27 Put yourself in the shoes of an executive at a big legacy company for a moment, operating in virtually any market vertical: you’re constantly hearing that data science is revolutionizing the world and the firms that survive and thrive in the coming years are those that execute on a data strategy. What does this mean for your company? How can you best guide your established firm through a successfu

Building a curriculum for educating data scientists: Interview with Prof. Xiao-Li Meng Feb 2, 2020 00:31:36 As demand for data scientists grows, and it remains as relevant as ever that practicing data scientists have a solid methodological and technical foundation for their work, higher education institutions are coming to terms with what’s required to educate the next cohorts of data scientists. The heterogeneity and speed of the field makes it challenging for even the most talented and dedicated educa

Running experiments when there are network effects Jan 27, 2020 00:24:45 Traditional A/B tests assume that whether or not one person got a treatment has no effect on the experiment outcome for another person. But that’s not a safe assumption, especially when there are network effects (like in almost any social context, for instance!) SUTVA, or the stable treatment unit value assumption, is a big phrase for this assumption and violations of SUTVA make for some pretty in

Zeroing in on what makes adversarial examples possible Jan 20, 2020 00:22:51 Adversarial examples are really, really weird: pictures of penguins that get classified with high certainty by machine learning algorithms as drumsets, or random noise labeled as pandas, or any one of an infinite number of mistakes in labeling data that humans would never make but computers make with joyous abandon. What gives? A compelling new argument makes the case that it’s not the algorithms

Unsupervised Dimensionality Reduction: UMAP vs t-SNE Jan 13, 2020 00:29:34 Dimensionality reduction redux: this episode covers UMAP, an unsupervised algorithm designed to make high-dimensional data easier to visualize, cluster, etc. It’s similar to t-SNE but has some advantages. This episode gives a quick recap of t-SNE, especially the connection it shares with information theory, then gets into how UMAP is different (many say better). Between the time we recorded and r

Data scientists: beware of simple metrics Jan 5, 2020 00:24:47 Picking a metric for a problem means defining how you’ll measure success in solving that problem. Which sounds important, because it is, but oftentimes new data scientists only get experience with a few kinds of metrics when they’re learning and those metrics have real shortcomings when you think about what they tell you, or don’t, about how well you’re really solving the underlying problem. This

Communicating data science, from academia to industry Dec 30, 2019 00:26:15 For something as multifaceted and ill-defined as data science, communication and sharing best practices across the field can be extremely valuable but also extremely, well, multifaceted and ill-defined. That doesn’t bother our guest today, Prof. Xiao-Li Meng of the Harvard statistics department, who is leading an effort to start an open-access Data Science Review journal in the model of the Harvar

Optimizing for the short-term vs. the long-term Dec 23, 2019 00:19:24 When data scientists run experiments, like A/B tests, it’s really easy to plan on a period of a few days to a few weeks for collecting data. The thing is, the change that’s being evaluated might have effects that last a lot longer than a few days or a few weeks—having a big sale might increase sales this week, but doing that repeatedly will teach customers to wait until there’s a sale and never bu

Interview with Prof. Andrew Lo, on using data science to inform complex business decisions Dec 16, 2019 00:27:46 This episode features Prof. Andrew Lo, the author of a paper that we discussed recently on Linear Digressions, in which Prof. Lo uses data to predict whether a medicine in the development pipeline will eventually go on to win FDA approval. This episode gets into the story behind that paper: how the approval prospects of different drugs inform the investment decisions of pharma companies, how to st

Using machine learning to predict drug approvals Dec 8, 2019 00:25:00 One of the hottest areas in data science and machine learning right now is healthcare: the size of the healthcare industry, the amount of data it generates, and the myriad improvements possible in the healthcare system lay the groundwork for compelling, innovative new data initiatives. One spot that drives much of the cost of medicine is the riskiness of developing new drugs: drug trials can cost

Facial recognition, society, and the law Dec 2, 2019 00:43:09 Facial recognition being used in everyday life seemed far-off not too long ago. Increasingly, it’s being used and advanced widely and with increasing speed, which means that our technical capabilities are starting to outpace (if they haven’t already) our consensus as a society about what is acceptable in facial recognition and what isn’t. The threats to privacy, fairness, and freedom are real, and

Lessons learned from doing data science, at scale, in industry Nov 25, 2019 00:28:00 If you’ve taken a machine learning class, or read up on A/B tests, you likely have a decent grounding in the theoretical pillars of data science. But if you’re in a position to have actually built lots of models or run lots of experiments, there’s almost certainly a bunch of extra “street smarts” insights you’ve had that go beyond the “books smarts” of more academic studies. The data scientists at

Varsity A/B Testing Nov 18, 2019 00:36:00 When you want to understand if doing something causes something else to happen, like if a change to a website causes and dip or rise in downstream conversions, the gold standard analysis method is to use randomized controlled trials. Once you’ve properly randomized the treatment and effect, the analysis methods are well-understood and there are great tools in R and python (and other languages) to

The Care and Feeding of Data Scientists: Growing Careers Nov 11, 2019 00:25:19 In the third and final installment of a conversation with Michelangelo D’Agostino, VP of Data Science and Engineering at Shoprunner, about growing and mentoring data scientists on your team. Some of our topics of conversation include how to institute hack time as a way to learn new things, what career growth looks like in data science, and how to institutionalize professional growth as part of a c

The Care and Feeding of Data Scientists: Recruiting and Hiring Data Scientists Nov 4, 2019 00:20:16 This week’s episode is the second in a three-part interview series with Michelangelo D’Agostino, VP of Data Science at Shoprunner. This discussion centers on building a team, which means recruiting, interviewing and hiring data scientists. Since data science talent is in such high demand, and data scientists are understandably choosy about where they go to work, a good recruiting and hiring progra

The Care and Feeding of Data Scientists: Becoming a Data Science Manager Oct 28, 2019 00:24:45 Data science management isn’t easy, and many data scientists are finding themselves learning on the job how to manage data science teams as they get promoted into more formal leadership roles. O’Reilly recently release a report, written by yours truly (Katie) and another experienced data science manager, Michelangelo D’Agostino, where we lay out the most important tasks of a data science manager a

Procella: YouTube's super-system for analytics data storage Oct 21, 2019 00:29:48 If you’re trying to manage a project that serves up analytics data for a few very distinct uses, you’d be wise to consider having custom solutions for each use case that are optimized for the needs and constraints of that use cases. You also wouldn’t be YouTube, which found themselves with this problem (gigantic data needs and several very different use cases of what they needed to do with that da

Kalman Runners Oct 13, 2019 00:15:59 The Kalman Filter is an algorithm for taking noisy measurements of dynamic systems and using them to get a better idea of the underlying dynamics than you could get from a simple extrapolation. If you've ever run a marathon, or been a nuclear missile, you probably know all about these challenges already. IMPORTANT NON-DATA SCIENCE CHICAGO MARATHON RACE RESULT FROM KATIE: My finish time was 3:20:

What's *really* so hard about feature engineering? Oct 6, 2019 00:21:18 Feature engineering is ubiquitous but gets surprisingly difficult surprisingly fast. What could be so complicated about just keeping track of what data you have, and how you made it? A lot, as it turns out—most data science platforms at this point include explicit features (in the product sense, not the data sense) just for keeping track of and sharing features (in the data sense, not the product

Data storage for analytics: stars and snowflakes Sep 30, 2019 00:15:22 If you’re a data scientist or data engineer thinking about how to store data for analytics uses, one of the early choices you’ll have to make (or live with, if someone else made it) is how to lay out the data in your data warehouse. There are a couple common organizational schemes that you’ll likely encounter, and that we cover in this episode: first is the famous star schema, followed by the also

Data storage: transactions vs. analytics Sep 23, 2019 00:16:08 Data scientists and software engineers both work with databases, but they use them for different purposes. So if you’re a data scientist thinking about the best way to store and access data for your analytics, you’ll likely come up with a very different set of requirements than a software engineer looking to power an application. Hence the split between analytics and transactional databases—certai

GROVER: an algorithm for making, and detecting, fake news Sep 16, 2019 00:18:28 There are a few things that seem to be very popular in discussions of machine learning algorithms these days. First is the role that algorithms play now, or might play in the future, when it comes to manipulating public opinion, for example with fake news. Second is the impressive success of generative adversarial networks, and similar algorithms. Third is making state-of-the-art natural language

Data science teams as innovation initiatives Sep 9, 2019 00:15:21 When a big, established company is thinking about their data science strategy, chances are good that whatever they come up with, it’ll be somewhat at odds with the company’s current structure and processes. Which makes sense, right? If you’re a many-decades-old company trying to defend a successful and long-lived legacy and market share, you won’t have the advantage that many upstart competitors h

Can Fancy Running Shoes Cause You To Run Faster? Sep 1, 2019 00:30:15 This is a re-release of an episode that originally aired on July 29, 2018. The stars aligned for me (Katie) this past weekend: I raced my first half-marathon in a long time and got to read a great article from the NY Times about a new running shoe that Nike claims can make its wearers run faster. Causal claims like this one are really tough to verify, because even if the data suggests that people

Organizational Models for Data Scientists Aug 25, 2019 00:23:09 When data science is hard, sometimes it’s because the algorithms aren’t converging or the data is messy, and sometimes it’s because of organizational or business issues: the data scientists aren’t positioned correctly to bring value to their organization. Maybe they don’t know what problems to work on, or they build solutions to those problems but nobody uses what they build. A lot of this can be

Data Shapley Aug 19, 2019 00:16:55 We talk often about which features in a dataset are most important, but recently a new paper has started making the rounds that turns the idea of importance on its head: Data Shapley is an algorithm for thinking about which examples in a dataset are most important. It makes a lot of intuitive sense: data that’s just repeating examples that you’ve already seen, or that’s noisy or an extreme outlier

A Technical Deep Dive on Stanley, the First Self-Driving Car Aug 12, 2019 00:41:32 This is a re-release of an episode that first ran on April 9, 2017. In our follow-up episode to last week's introduction to the first self-driving car, we will be doing a technical deep dive this week and talking about the most important systems for getting a car to drive itself 140 miles across the desert. Lidar? You betcha! Drive-by-wire? Of course! Probabilistic terrain reconstruction? A

An Introduction to Stanley, the First Self-Driving Car Aug 5, 2019 00:14:19 In October 2005, 23 cars lined up in the desert for a 140 mile race. Not one of those cars had a driver. This was the DARPA grand challenge to see if anyone could build an autonomous vehicle capable of navigating a desert route (and if so, whose car could do it the fastest); the winning car, Stanley, now sits in the Smithsonian Museum in Washington DC as arguably the world's first real self-driv

Putting the "science" in data science: the scientific method, the null hypothesis, and p-hacking Jul 29, 2019 00:24:11 The modern scientific method is one of the greatest (perhaps the greatest?) system we have for discovering knowledge about the world. It’s no surprise then that many data scientists have found their skills in high demand in the business world, where knowing more about a market, or industry, or type of user becomes a competitive advantage. But the scientific method is built upon certain processes,

Interleaving Jul 22, 2019 00:16:54 If you’re Google or Netflix, and you have a recommendation or search system as part of your bread and butter, what’s the best way to test improvements to your algorithm? A/B testing is the canonical answer for testing how users respond to software changes, but it gets tricky really fast to think about what an A/B test means in the context of an algorithm that returns a ranked list. That’s why we’r

Federated Learning Jul 14, 2019 00:15:03 This is a re-release of an episode first released in May 2017. As machine learning makes its way into more and more mobile devices, an interesting question presents itself: how can we have an algorithm learn from training data that's being supplied as users interact with the algorithm? In other words, how do we do machine learning when the training dataset is distributed across many devices, imb

Endogenous Variables and Measuring Protest Effectiveness Jul 7, 2019 00:17:58 This is a re-release of an episode first released in February 2017. Have you been out protesting lately, or watching the protests, and wondered how much effect they might have on lawmakers? It's a tricky question to answer, since usually we need randomly distributed treatments (e.g. big protests) to understand causality, but there's no reason to believe that big protests are actually randomly di

Deepfakes Jul 1, 2019 00:15:08 Generative adversarial networks (GANs) are producing some of the most realistic artificial videos we’ve ever seen. These videos are usually called “deepfakes”. Even to an experienced eye, it can be a challenge to distinguish a fabricated video from a real one, which is an extraordinary challenge in an era when the truth of what you see on the news or especially on social media is worthy of skeptic

Revisiting Biased Word Embeddings Jun 24, 2019 00:18:09 The topic of bias in word embeddings gets yet another pass this week. It all started a few years ago, when an analogy task performed on Word2Vec embeddings showed some indications of gender bias around professions (as well as other forms of social bias getting reproduced in the algorithm’s embeddings). We covered the topic again a while later, covering methods for de-biasing embeddings to countera

Episodes

Recommended