Home Podcasts Eye on AI Weekly Research Watch
Eye on AI Weekly Research Watch

Eye on AI Weekly Research Watch

Craig Spencer Smith 40 Episodes Jun 30, 2026

Eye on AI Weekly Research Watch provides weekly, digestible podcast explainers of significant research papers in the field of artificial intelligence. Each episode breaks down complex AI research into accessible summaries for a broad audience. The podcast aims to keep listeners informed about the latest developments and breakthroughs in AI research.

Episodes

Beyond Sparse Supervision: Diffusion-Guided Learning for Few-Shot Graph Fraud Detection Jun 30, 2026 121 Financial fraud detection in transaction networks faces a fundamental challenge: fraudulent activity is rare, well-disguised, and often underrepresented in labeled data. Standard graph neural networks tend to smooth out the very irregularities that signal fraud. ADC-GNN tackles this with three complementary mechanisms: diffusion-guided feature augmentation that stabilizes node representat
Toward Robust In-Context Segmentation via Concept Guidance Jun 30, 2026 149 In-context segmentation asks a model to identify target regions in new images using only a handful of labeled reference examples — no retraining required. Current approaches work by matching low-level visual features between references and queries, making them brittle when references vary in viewpoint, lighting, or appearance. CG-ICS instead extracts high-level semantic concepts from refe
Robust Harmful Features Under Jailbreak Attacks: Mechanistic Evidence from Attention Head Specialization in Large Language Models Jun 30, 2026 181 Jailbreak attacks — prompts engineered to make safety-aligned LLMs produce harmful outputs — are a persistent concern, but exactly how they work mechanistically has remained murky. This paper provides evidence that successful attacks don't erase safety representations; they selectively suppress specific "Adversarially Compromised Heads" in early attention layers while leaving "Safety-Alig
Tandem Reinforcement Learning with Verifiable Rewards Jun 30, 2026 139 Reinforcement learning has dramatically improved LLM reasoning on tasks like competition math — but the resulting models often reason in ways that are difficult for weaker models or humans to follow, limiting their real-world utility. Tandem Reinforcement Learning (TRL) addresses this by co-training a strong "senior" model alongside a frozen "junior" model: both contribute to generating r
CPAgents: Agentic Composite Phenotype Generation for Cardiac Disease Association Jun 30, 2026 193 Large-scale studies linking heart imaging measurements to disease risk typically rely on pre-defined, single-variable features chosen by experts — an approach that may miss important non-linear relationships or interactions between measurements. CPAgents automates the discovery of richer, composite phenotypes (ratios, polynomial combinations, interaction terms) through a three-agent loop:
LLawCo: Learning Laws of Cooperation for Modeling Embodied Multi-Agent Behavior Jun 30, 2026 154 Getting multiple AI agents to work together effectively in a shared physical environment is harder than it sounds — agents frequently act on outdated assumptions about their partners or issue redundant, mistimed communications. LLawCo addresses this by having agents reflect on past failures to extract high-level "laws of cooperation," such as knowing when to speak and when to wait, then f
Cognitive Episodes in LLM Reasoning Traces Enable Interpretable Human Item Difficulty Prediction Jun 30, 2026 148 Predicting how hard an exam question will be for human test-takers — without running expensive human trials — would transform educational assessment. This paper proposes using the reasoning traces of large language models as a proxy for human cognitive effort. Rather than treating these traces as raw text, Epi2Diff structures them into meaningful "cognitive episodes" — functional states l
The Remittance Blueprint: Data-driven Intelligence for Sri Lanka Jun 30, 2026 163 Remittances — money sent home by migrant workers — are a lifeline for many developing economies, yet surprisingly hard to forecast reliably. This study applies rigorous time-series and machine learning methods to 32 years of Sri Lankan migration and remittance data, finding that external factors like exchange rates and global oil prices drive inflows far more than domestic indicators. A m
HAT-4D: Lifting Monocular Video for 4D Multi-Object Interactions via Human-Agent Collaboration Jun 30, 2026 160 Building robots that can understand and interact with the physical world requires massive amounts of 3D training data — but capturing that data with multi-camera rigs is expensive and impractical at scale. HAT-4D proposes using ordinary monocular video as a data source, reconstructing the 3D geometry and temporal dynamics of multiple interacting objects with the help of vision-language mo
Towards Value-Constrained Credit Assignment in Fully Delegated AI Cooperatives Jun 30, 2026 144 As AI systems increasingly act as proxies for human stakeholders in shared learning environments, a thorny question arises: how do you fairly reward each participant's contribution when different contributors have different values — and when some contributions might violate those values? This paper proposes a framework that filters gradient updates by each principal's value profile before
Exposure Bias Can Alleviate Itself via Directional and Frequency Rectification in Flow Matching Jun 30, 2026 160 Flow matching is a powerful framework for generating images and other data by learning to map noise to structure, but it suffers from a training-inference mismatch: models are trained on clean trajectories but must operate on drifted ones at test time. DEFAR turns this problem on its head, treating the drift itself as a useful signal. It uses the bias to learn corrective directions and to
Govern the Repository, Not the Agent: Measuring Ecosystem-Level Risk in AI-Native Software Jun 30, 2026 174 When AI agents autonomously write and merge code at scale, the usual way of evaluating them — task by task, in isolation — misses something important: the cumulative friction and technical debt that builds up in shared codebases over time. Studying over 930,000 agent-authored pull requests, this paper finds that about half of "integration friction" is a property of the repository ecosyste

Recommended