"I Invented the Transformer. Now I'm Replacing It." & Continuous Thought Machines - Llion Jones and Luke Darlow [Sakana AI]

Machine Learning Street Talk (MLST) - A podcast by Machine Learning Street Talk (MLST)

Podcast artwork

Categorie:

The Transformer architecture (which powers ChatGPT and nearly all modern AI) might be trapping the industry in a localized rut, preventing us from finding true intelligent reasoning, according to the person who co-invented it. Llion Jones and Luke Darlow, key figures at the research lab Sakana AI, join the show to make this provocative argument, and also introduce new research which might lead the way forwards.**SPONSOR MESSAGES START**—Build your ideas with AI Studio from Google - http://ai.studio/build—Tufa AI Labs is hiring ML Research Engineers https://tufalabs.ai/ —cyber•Fund https://cyber.fund/?utm_source=mlst is a founder-led investment firm accelerating the cybernetic economyHiring a SF VC Principal: https://talent.cyber.fund/companies/cyber-fund-2/jobs/57674170-ai-investment-principal#content?utm_source=mlstSubmit investment deck: https://cyber.fund/contact?utm_source=mlst—**END**The "Spiral" Problem – Llion uses a striking visual analogy to explain what current AI is missing. If you ask a standard neural network to understand a spiral shape, it solves it by drawing tiny straight lines that just happen to look like a spiral. It "fakes" the shape without understanding the concept of spiraling. Introducing the Continuous Thought Machine (CTM) Luke Darlow deep dives into their solution: a biology-inspired model that fundamentally changes how AI processes information.The Maze Analogy: Luke explains that standard AI tries to solve a maze by staring at the whole image and guessing the entire path instantly. Their new machine "walks" through the maze step-by-step.Thinking Time: This allows the AI to "ponder." If a problem is hard, the model can naturally spend more time thinking about it before answering, effectively allowing it to correct its own mistakes and backtrack—something current Language Models struggle to do genuinely.https://sakana.ai/https://x.com/YesThisIsLionhttps://x.com/LearningLukeDTRANSCRIPT:https://app.rescript.info/public/share/crjzQ-Jo2FQsJc97xsBdfzfOIeMONpg0TFBuCgV2Fu8TOC:00:00:00 - Stepping Back from Transformers00:00:43 - Introduction to Continuous Thought Machines (CTM)00:01:09 - The Changing Atmosphere of AI Research00:04:13 - Sakana’s Philosophy: Research Freedom00:07:45 - The Local Minimum of Large Language Models00:18:30 - Representation Problems: The Spiral Example00:29:12 - Technical Deep Dive: CTM Architecture00:36:00 - Adaptive Computation & Maze Solving00:47:15 - Model Calibration & Uncertainty01:00:43 - Sudoku Bench: Measuring True ReasoningREFS:Why Greatness Cannot be planned [Kenneth Stanley]https://www.amazon.co.uk/Why-Greatness-Cannot-Planned-Objective/dp/3319155237https://www.youtube.com/watch?v=lhYGXYeMq_E The Hardware Lottery [Sara Hooker]https://arxiv.org/abs/2009.06489https://www.youtube.com/watch?v=sQFxbQ7ade0 Continuous Thought Machines [Luke Darlow et al / Sakana]https://arxiv.org/abs/2505.05522https://sakana.ai/ctm/ LSTM: The Comeback Story? [Prof. Sepp Hochreiter]https://www.youtube.com/watch?v=8u2pW2zZLCs Questioning Representational Optimism in Deep Learning: The Fractured Entangled Representation Hypothesis [Kumar/Stanley]https://arxiv.org/pdf/2505.11581 A Spline Theory of Deep Networks [Randall Balestriero]https://proceedings.mlr.press/v80/balestriero18b/balestriero18b.pdf https://www.youtube.com/watch?v=86ib0sfdFtw https://www.youtube.com/watch?v=l3O2J3LMxqI On the Biology of a Large Language Model [Anthropic, Jack Lindsey et al]https://transformer-circuits.pub/2025/attribution-graphs/biology.html The ARC Prize 2024 Winning Algorithm [Daniel Franzen and Jan Disselhoff] “The ARChitects”https://www.youtube.com/watch?v=mTX_sAq--zYNeural Turing Machine [Graves]https://arxiv.org/pdf/1410.5401 Adaptive Computation Time for Recurrent Neural Networks [Graves]https://arxiv.org/abs/1603.08983 Sudoko Bench [Sakana] https://pub.sakana.ai/sudoku/

Visit the podcast's native language site