Best AI papers explained

A podcast by Enoch H. Kang

550 Episodio

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs
Pubblicato: 09/05/2025
Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data
Pubblicato: 09/05/2025
Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation
Pubblicato: 09/05/2025
Accelerating Unbiased LLM Evaluation via Synthetic Feedback
Pubblicato: 09/05/2025
Prediction-Powered Statistical Inference Framework
Pubblicato: 09/05/2025
Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL
Pubblicato: 09/05/2025
RM-R1: Reward Modeling as Reasoning
Pubblicato: 09/05/2025
Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy
Pubblicato: 08/05/2025
Decoding Claude Code: Terminal Agent for Developers
Pubblicato: 07/05/2025
Emergent Strategic AI Equilibrium from Pre-trained Reasoning
Pubblicato: 07/05/2025
Benefiting from Proprietary Data with Siloed Training
Pubblicato: 06/05/2025
Advantage Alignment Algorithms
Pubblicato: 06/05/2025
Asymptotic Safety Guarantees Based On Scalable Oversight
Pubblicato: 06/05/2025
What Makes a Reward Model a Good Teacher? An Optimization Perspective
Pubblicato: 06/05/2025
Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems
Pubblicato: 06/05/2025
Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts
Pubblicato: 06/05/2025
You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation
Pubblicato: 06/05/2025
Interplay of LLMs in Information Retrieval Evaluation
Pubblicato: 03/05/2025
Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence
Pubblicato: 03/05/2025
Toward Efficient Exploration by Large Language Model Agents
Pubblicato: 03/05/2025

20 / 28

Cut through the noise. We curate and break down the most important AI papers so you don’t have to.

Visit the podcast's native language site

550 Episodio

The Alternative Annotator Test for LLM-as-a-Judge: How to Statistically Justify Replacing Human Annotators with LLMs

Limits to scalable evaluation at the frontier: LLM as Judge won’t beat twice the data

Stratified Prediction-Powered Inference for Hybrid Language Model Evaluation

Accelerating Unbiased LLM Evaluation via Synthetic Feedback

Prediction-Powered Statistical Inference Framework

Optimizing Chain-of-Thought Reasoners via Gradient Variance Minimization in Rejection Sampling and RL

RM-R1: Reward Modeling as Reasoning

Reexamining the Aleatoric and Epistemic Uncertainty Dichotomy

Decoding Claude Code: Terminal Agent for Developers

Emergent Strategic AI Equilibrium from Pre-trained Reasoning

Benefiting from Proprietary Data with Siloed Training

Advantage Alignment Algorithms

Asymptotic Safety Guarantees Based On Scalable Oversight

What Makes a Reward Model a Good Teacher? An Optimization Perspective

Towards Guaranteed Safe AI: A Framework for Ensuring Robust and Reliable AI Systems

Identifiable Steering via Sparse Autoencoding of Multi-Concept Shifts

You Are What You Eat - AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

Interplay of LLMs in Information Retrieval Evaluation

Trade-Offs Between Tasks Induced by Capacity Constraints Bound the Scope of Intelligence

Toward Efficient Exploration by Large Language Model Agents