Harmless goals lead to harmful behaviour, or: why AI is born evil
Surely no harm could come from building a chess-playing robot, could it? In this paper we argue that such a robot will indeed be dangerous unless it is designed very carefully. Without special precautions, it will resist being turned off, will try to break into other machines and make copies of itself, and will try to acquire resources without regard for anyone else’s safety.
These are the first sentences of the paper “Basic AI Drives” from Stephen Omohundro. He argues that AI systems are inherently egoistic and careful design is necessary to prevent emergent behaviour from developing hostile self-defence mechanisms. If you read the paper — and I suggest you do, it’s only 11 pages and well written — you’ll see that an AI Panic Level rise of +10% is justified.
In his paper, Omohundro makes the following propositions:
- AIs will want to self-improve
- AIs will want to be rational
- AIs will try to preserve their utility functions
- AIs will try to prevent counterfeit utility (i.e., corruption)
- AIs will be self-protective
- AIs will want to acquire resources and use them efficiently
He concludes with a warning that we have to make appropriate changes soon and suggests to set up a “universal constitution” comprising the most essential rights of individuals.
In fact, while reading this paper, I was reminded of the Chernobyl disaster. Not because of the potential hazards, but more because of the reactor design: Back in the time of the disaster, many reactors possessed what is called an active safety system. Upon a failure or safety problem, this system had to be actively activated. That is similar to the design of an AI system, a lot of work has to be actively done to avoid mistakes and ill-behaviour.
Nowadays most reactors have passive nuclear safety features, i.e., they default to a safe system state once something goes wrong. This is what we need for AI development as well: A framework or model that by default returns into a safe state and does not escalate harmfulness. How that might look like I don’t know, maybe Emotional Models that intervene with the pure logic based algorithms are a solution.
But as long as it is actually easier to write harmful AIs, it is only a matter of time until something goes wrong.
[Via Accelerating Future Blog]
Stumble it!Popularity: 21%


Comments (One comment)
Robin,
Thanks for spreading the word about my paper! I love your image of the chess pieces behaving badly. And I applaud your thoughts about safety systems. The very nature of AI probably precludes a purely passive approach like that used in reactors, but I agree that any safety system must be a robust and integral part of the infrastructure. It certainly shouldn’t be vulnerable to small numbers of malicious entities or component failures. It has been estimated that 3% of humanity is currently sociopathic. But human society is robust enough and has constructed incentives to keep that part of the population in check. I am optimistic that as we understand the nature of emerging technologies better, we can create a social infrastructure that brings out the best of humanity rather than destroying it. The more we know and think about future possibilities, the better chance we have of creating a positive future. So, thanks for your website and your efforts in stimulating this dialogue about our future.
Best,
Steve
Steve Omohundro / February 3rd, 2008, 0:27 / #
Post a comment