EA - What could an AI-caused existential catastrophe actually look like? by Benjamin Hilton

The Nonlinear Library: EA Forum - A podcast by The Nonlinear Fund

Categorie:

Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: What could an AI-caused existential catastrophe actually look like?, published by Benjamin Hilton on September 12, 2022 on The Effective Altruism Forum. This article forms part of 80000 Hours's explanation of risks from artificial intelligence, and focuses on how an AI system could cause an existential catastrophe. Our full problem profile on risks from AI looks at why we’re worried things like this will happen. At 5:29 AM on July 16, 1945, deep in the Jornada del Muerto desert in New Mexico, the Manhattan Project carried out the world’s first successful test of a nuclear weapon. From that moment, we’ve had the technological capacity to wipe out humanity. But if you asked someone in 1945 to predict exactly how this risk would play out, they would almost certainly have got it wrong. They may have thought there would have been more widespread use of nuclear weapons in World War II. They certainly would not have predicted the fall of the USSR 45 years later. Current experts are concerned about India–Pakistan nuclear conflict and North Korean state action, but 1945 was before even the partition of India or the Korean War. That is to say, you’d have real difficulty predicting anything about how nuclear weapons would be used. It would have been even harder to make these predictions in 1933, when Leo Szilard first realised that a nuclear chain reaction of immense power could be possible, without any concrete idea of what these weapons would look like. Despite this difficulty, you wouldn’t be wrong to be concerned. In our problem profile on AI, we describe a very general way in which advancing AI could go wrong. But there are lots of specifics we can’t know much about at this point. Maybe there will be a single transformative AI system, or maybe there will be many; there could be very fast growth in the capabilities of AI, or very slow growth. Each scenario will look a little different, and carry different risks. And the specific problems that arise in any one scenario are necessarily less likely to happen than the overall risk. Despite not knowing how things will play out, it may still be useful to look at some concrete possibilities of how things could go wrong. In particular, we argued in the full profile that sufficiently advanced systems might be able to take power away from humans — how could that possibly happen? How could a power-seeking AI actually take power? Here are seven possible techniques that could be used by a power-seeking AI (or multiple AI systems working together) to actually gain power. These techniques could all interact with one another, and it’s difficult to say at this point (years or decades before the technology exists) which are most likely to be used. Also, systems more intelligent than humans could develop plans to seek power that we haven’t yet thought of. 1. Hacking Software is absolutely full of vulnerabilities. The US National Institute of Standards and Technology reported over 8,000 vulnerabilities found in systems across the world in 2021 — an average of 50 per day. Most of these are small, but every so often they are used to cause huge chaos. The list of most expensive crypto hacks keeps getting new entrants — as of March 2022, the largest was $624 million stolen from Ronin Network. And nobody noticed for six days. One expert we spoke to said that professional ‘red teams’ — security staff whose job it is to find vulnerabilities in systems — frequently manage to infiltrate their clients, including crucial and powerful infrastructure like banks and national energy grids. In 2010, the Stuxnet virus successfully managed to destroy Iranian nuclear enrichment centrifuges — despite these centrifuges being completely disconnected from the internet — marking the first time a piece of malware was used to cause physical damage. A Russian hack in 20...

Visit the podcast's native language site