EA - My Objections to "We’re All Gonna Die with Eliezer Yudkowsky" by Quintin Pope

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Podcast artwork

Categorie:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: My Objections to "We’re All Gonna Die with Eliezer Yudkowsky", published by Quintin Pope on March 21, 2023 on The Effective Altruism Forum.Note: manually cross-posted from LessWrong. See here for discussion on LW.IntroductionI recently watched Eliezer Yudkowsky's appearance on the Bankless podcast, where he argued that AI was nigh-certain to end humanity. Since the podcast, some commentators have offered pushback against the doom conclusion. However, one sentiment I saw was that optimists tended not to engage with the specific arguments pessimists like Yudkowsky offered.Economist Robin Hanson points out that this pattern is very common for small groups which hold counterintuitive beliefs: insiders develop their own internal language, which skeptical outsiders usually don't bother to learn. Outsiders then make objections that focus on broad arguments against the belief's plausibility, rather than objections that focus on specific insider arguments.As an AI "alignment insider" whose current estimate of doom is around 5%, I wrote this post to explain some of my many objections to Yudkowsky's specific arguments. I've split this post into chronologically ordered segments of the podcast in which Yudkowsky makes one or more claims with which I particularly disagree. All bulleted points correspond to specific claims by Yudkowsky, and I follow each bullet point with text that explains my objections to the claims in question.I have my own view of alignment research: shard theory, which focuses on understanding how human values form, and on how we might guide a similar process of value formation in AI systems.I think that human value formation is not that complex, and does not rely on principles very different from those which underlie the current deep learning paradigm. Most of the arguments you're about to see from me are less:I think I know of a fundamentally new paradigm that can fix the issues Yudkowsky is pointing at.and more:Here's why I don't agree with Yudkowsky's arguments that alignment is impossible in the current paradigm.My objectionsWill current approaches scale to AGI?Yudkowsky apparently thinks not, and that the techniques driving current state of the art advances, by which I think he means the mix of generative pretraining + small amounts of reinforcement learning such as with ChatGPT, aren't reliable enough for significant economic contributions. However, he also thinks that the current influx of money might stumble upon something that does work really well, which will end the world shortly thereafter.I'm a lot more bullish on the current paradigm. People have tried lots and lots of approaches to getting good performance out of computers, including lots of "scary seeming" approaches such as:Meta-learning over training processes. I.e., using gradient descent over learning curves, directly optimizing neural networks to learn more quickly.Teaching neural networks to directly modify themselves by giving them edit access to their own weights.Training learned optimizers - neural networks that learn to optimize other neural networks - and having those learned optimizers optimize themselves.Using program search to find more efficient optimizers.Using simulated evolution to find more efficient architectures.Using efficient second-order corrections to gradient descent's approximate optimization process.Tried applying biologically plausible optimization algorithms inspired by biological neurons to training neural networks.Adding learned internal optimizers (different from the ones hypothesized in Risks from Learned Optimization) as neural network layers.Having language models rewrite their own training data, and improve the quality of that training data, to make themselves better at a given task.Having language models devise their own programming...

Visit the podcast's native language site