Ben Goertzel Interviews Paul Werbos on Existential Risks
The idea that advanced technology may pose a risk to the very survival of the human race (an “existential risk”) is hardly a new one – but it’s a difficult topic to think about, both because of its emotional impact, and because of its wildly cross-disciplinary nature. It’s a subject demanding thinkers who are ruthless in rationality, polymathic in scope, and sensitive to the multitudinous dimensions of human nature. Dr. Paul Werbos is one such thinker, and this interview probes some of his views on the greatest risks posed by technology to humanity in the foreseeable future.
Paul is a Program Director in the US National Science Foundation (NSF)’s Division of Electrical, Communications & Cyber Systems. His responsibilities at the NSF include the Adaptive and Intelligent Systems (AIS) area within the Power, Controls and Adaptive Networks (PCAN) Program, and the new area of Quantum, Molecular and High-Performance Modeling and Simulation for Devices and Systems.
His reputation as a scientist was established early on, by his 1974 Harvard University Ph.D. thesis, which gave an early description of the process of training artificial neural networks through backpropagation of errors. He was one of the original three two-year Presidents of the International Neural Network Society (INNS), and has since remained active in the neural network field, along with a host of other scientific areas in artificial intelligence, quantum physics, space science, and other domains.
In addition to his NSF administration work and his scientific research, Paul has led multiple initiatives aimed at exploring or improving the near, medium and long-term future of the human race. He serves on the Planning Committee of the ACUNU Millennium Project, whose annual report on the future tends to lead global lists of respected reports on the long-term future; and he has served on multiple committees related to space exploration and colonization. And (like me) he is on the Advisory Board of the Lifeboat Foundation, an organization devoted to understanding and minimizing existential risks, and is a very active participant on their online mailing list.
His website werbos.com reviews “6 MegaChallenges for the 21st Century”
- What is Mind? (how to build/understand intelligence)
- How does the Universe work? (Quantum physics…)
- What is Life? (e.g., quantitative systems biotechnology)
- Sustainable growth on Earth
- Cost-effective sustainable space settlement
- Human potential — growth/learning in brain, soul, integration (body)
and presents both original ideas and literature reviews on all these areas; I encourage you to check it out and spend some time there.
Given this amazing diversity and depth, there was a great deal Paul Werbos and I could have talked about during an interview – but I was particularly interested to delve into his views on existential risks.
I’ve really appreciated reading your comments on the Lifeboat Foundation email list about the risks facing humanity as we move forward (as well as a host of other topics). So I’m happy to have the chance to interview you on the topic – to dig into some of the details, and hopefully get at the crux of your perspective.
These are very important matters to discuss. But I have to begin with the caveat that anything I say here is not an official NSF view, and that what I say must be oversimplified even as a reflection of my own views, because of the inherent complexity of this issue.
Sure … understood.
So, I’ll start off with a little tiny innocuous question: What do you think is a reasonable short-list of the biggest existential risks facing humanity during the next century?
Number one by far: the risk that bad trends may start to cause an abuse of nuclear (or biological) weapons, so bad that we either slowly make the earth uninhabitable for humans, or we enter an extreme anti technology dark age which is brittle and unsustainable on the opposite side of things.
For now, I focus most of my own efforts here on trying to help us get to a sustainable global energy system, as soon as possible — but I agree with www.stateofthefuture.org that population issues, water issues, and subtler issues like human potential, cultural progress and sustainable economic growth are also critical.
Number two, not far behind: I really think that the “Terminator II scenario” is far closer to reality than most people think. It scares me sometimes how strong the incentives are pushing people like me to make such things happen, and how high the price can be to resist. But at the same time, we will also need better understanding of ourselves in order to navigate all these challenges, and that means we do need to develop the underlying mathematics and understanding, even if society seems to be demanding the exact opposite.
Also, we have to consider how the risks of “artificial stupidity” can be just as great as those of artificial intelligence; for example, if systems like crude voicemail systems start to control more and more of the world, that can be bad too.
Tied for number three:
- The bad consequences of using wires or microwave to directly perturb the primary reinforcement centers of the brain, thereby essentially converting human beings into robotic appliances — in the spirit of “Clone Wars.” The same people who once told us that frontal lobotomies are perfectly safe and harmless are essentially doing it all again.
- The risk of generating black holes or other major catastrophes, if we start to do really interesting physics experiments here on the surface of the earth. Lack of imagination in setting up experiments may save us for some time, but space would be a much safer place for anything really scaling up to be truly interesting. This adds to the many other reasons why I wish we would go ahead and take the next critical steps in reducing the cost of access to space.
And just how serious do you think these risks are? How dangerous is our situation, from the perspective of the survival of humanity?
(Which, to be noted, some would say is a narrow perspective, arguing that if we create transhuman AIs that discover great things, have great experiences and populate the universe, the human race has done its job whether or not any humans survive!)
In the Humanity 3000 seminar in 2005, organized by the Foundation for the Future, I remember the final session — a debate on the proposition “humanity will survive that long after all, or not.” At the time, I got into trouble (as I usually do!) by saying we would be fools EITHER to agree or disagree — to attribute either a probability of one or a probability of zero. There are natural incentives out there for experts to pretend to know more than they do, and to “adopt a position” rather than admit to a probability distribution, as applied decision theorists like Howard Raiffa have explained how to do. How can we work rationally to increase the probability of human survival, if we pretend that we already know the outcome, and that nothing we do can change it?
But to be honest… under present conditions, I might find that sort of “Will humanity survive or not?” debate more useful (if conducted with a few more caveats). Because lately it becomes more and more difficult for me to make out where the possible light at the end of the tunnel really is – it becomes harder for me to argue for a nonzero probability. Sometimes, in mathematics or engineering, the effort to really prove rigorously that something is impossible can be very useful in locating the key loopholes which make it possible after all — but only for those who understand the loopholes.
So you’re arguing for the intellectual value of arguing that humanity’s survival is effectively impossible?
But on the other hand, sometimes that approach ends up just being depressing, and sometimes we just have to wing it as best we can.
Indeed, humanity has been pretty good at winging it so far – though admittedly we’re venturing into radically new territory.
My next question has to do, not so much with the risks humanity faces, as with humanity’s perception of those risks. What do you think are the biggest misconceptions about existential risk, among the general population?
There are so many misconceptions in so many diverse places, it’s hard to know where to begin!
I guess I’d say that the biggest, most universal problem is people’s feeling that they can solve these problems either by “killing the bad guys” or by coming up with magic incantations or arguments which make the problems go away.
The problem is not so much a matter of people’s beliefs, as of people’s sense of reality, which may be crystal clear within a few feet of their face, but gets a lot fuzzier as one moves out further in space and time. Valliant of Harvard has done an excellent study [XX link? XX] of the different kinds of defense mechanisms people use to deal with stress and bafflement. Some of them tend to lead to great success, while others, like rage and denial, lead to self-destruction. Overuse of denial, and lack of clear sane thinking in general, get in the way of performance and self-preservation at many levels of life.
People may imagine that their other activities will continue somehow, and be successful, even if those folks on planet earth succeed in killing themselves off.
So let’s dig a little deeper into the risks associated with powerful AI – my own main research area, as you know.
One view on the future of AI and the Singularity is that there is an irreducible uncertainty attached to the creation of dramatically greater than human intelligence. That is, in this view, there probably isn’t really any way to eliminate or drastically mitigate the existential risk involved in creating superhuman AGI. So, in this view, building superhuman AI is essentially plunging into the Great Unknown and swallowing the risk because of the potential reward (where the reward may be future human benefit, or something else like the creation of aesthetically or morally pleasing superhuman beings, etc.).
Another view is that if we engineer and/or educate our AGI systems correctly, we can drastically mitigate the existential risk associated with superhuman AGI, and create a superhuman AGI that’s highly unlikely to pose an existential risk to humanity. What are your thoughts on these two views? Do you have an intuition on which one is more nearly correct? (Or do you think both are wrong?) By what evidence or lines of thought is your intuition on this informed/inspired?
Your first view does not say how superhuman intelligence would turn out, really.
I agree more with the first viewpoint. The term “great unknown” is not inappropriate here.
People who think they can reliably control something a million times smarter than they are — are … not in touch with the reality of such a situation.
Nor are they in touch with how intelligence works. It’s pretty clear from the math, though I feel uncomfortable with going too far into the goriest details.
The key point is that any real intelligence will ultimately have some kind of utility function system U
built into it. Whatever you pick, you have to live with the consequences — including the full range of ways in which an intelligent system can get at whatever you choose. Most options end up being pretty ghastly if you look closely enough.
You can’t build a truly “friendly AI” just by hooking a computer up to a Mr. Potato Head with a “smile” command.
It doesn’t work that way.
Well, not every AI system has to be built explicitly as a reinforcement learning system. Reinforcement learning based on seeking to maximize utility functions is surely going to be an aspect of any intelligent system, but an AI system doesn’t need to be built to rigidly possess a certain utility function and ceaselessly seek to maximize it. I wrote a blog post some time ago arguing against the dangers of the pure reinforcement learning approach.
After all, humans are only imperfectly modeled as utility-maximizers. We’re reasonably good at subverting our inherited or acquired goals, sometimes in unpredictable ways.
But of course a self-organizing or quasi-randomly drifting superhuman AI isn’t necessarily any better than a remorseless utility maximizer. My hope is to create beneficial AI systems via a combination of providing them with reasonable in-built goals, and teaching them practical everyday empathy and morality by interacting with them, much as we do with young children.
If I were to try to think of a non-short-circuited “friendly AI”… the most plausible thing I can think of is a logical development that might well occur if certain “new thrusts” in robotics really happen, and really exploit the very best algorithms that some of us know about. I remember Shibata’s “seal pup” robot, a furry friendly thing, with a reinforcement learning system inside, connected to try to maximize the number of petting strokes it gets from humans. If people work really hard on “robotic companions” — I do not see any real symbolic communication in the near future, but I do see ways to get full nonverbal intelligence, tactile and movement intelligence far beyond even the human norm. (Animal noises and smells included.) So the best-selling robotics companions (if we follow the marketplace) would probably have the most esthetically pleasing human forms possible, contain fully embodied intelligence (probably the safest kind), and be incredibly well-focused and effective in maximizing the amount of tactile feedback they get from specific human owners.
Who needs six syllable words to analyze such a scenario, to first order? No need to fog out on metaformalisms.
If you have a sense of reality, you know what I am talking about by now. It has some small, partial reflection in the movie “AI.”
What would our congresspeople do if they learned this was the likely real outcome of certain research efforts? My immediate speculation — first, strong righteous speeches against it; second, zeroing out the budget for it; third, trying hard to procure samples for their own use, presumably smuggled from China.
But how benign would it really be? Some folks would immediately imagine lots of immediate benefit.
Others might cynically say that this would not be the first time that superior intelligences were held in bondage and made useful to those less intelligent than they are, just by tactile feedback and such. But in fact, one track minds could create difficulties somewhat more problematic… and it quickly becomes an issue just who ends up in bondage to whom.
So, a warm friendly cuddly robot helper, that learns to be nice via interacting with us and becoming part of the family? Certainly this would be popular and valuable – and I think feasible in the foreseeable future.
My own view is, I don’t think this is an end goal for advanced AI, but I think it could be a reasonable step along the path, and a way for AIs to absorb some human values and get some empathy for us — before AI takes off and leaves us in the dust intelligence-wise..
Another approach that’s been suggested, in order to mitigate various existential risks, is to create a sort of highly intelligent “AGI Nanny” or “Singularity Steward.” This would be a roughly human-level AGI system without capability for dramatic self-modification, and with strong surveillance powers, given the task of watching everything that humans do and trying to ensure that nothing extraordinarily dangerous happens. One could envision this as a quasi-permanent situation, or else as a temporary fix to be put into place while more research is done regarding how to launch a Singularity safely.
What are your views on this AI Nanny scenario?
So the idea is to insert a kind of danger detector into the computer, a detector which serves as the utility function?
How would one design the danger detector?
If the human race could generate a truly credible danger detector, that most of us would agree to, that would already be interesting enough in itself.
Of course, an old Asimov story described one scenario for what that would do. If the primary source of danger is humans, then the easiest way to eliminate it is to eliminate them.
And I can also imagine what some of the folks in Congress would say about the international community developing an ultrapowerful ultrananny for the entire globe.
Yet another proposal that’s been suggested, to mitigate the potential existential risk of human-level or superhuman AGIs, is to create a community of AGIs and have them interact with each other, comprising a society with its own policing mechanisms and social norms and so forth. The different AGIs would then keep each other in line. A “social safety net” so to speak.
My impression is that this could work OK so long as the AGIs in the community could all understand each other fairly well — i.e. none was drastically smarter than all the others; and none was so different from all the others, that its peers couldn’t tell if it was about to become dramatically smarter. But in that case, the question arises whether the conformity involved in maintaining a viable “social safety net” as described above, is somehow too stifling. A lot of progress in human society has been made by outlier individuals thinking very differently than the norm, and incomprehensible to this peers — but this sort of different-ness seems to inevitably pose risk, whether among AGIs or humans.
It is far from obvious to me that more conformity of thought implies more safety or even more harmony.
Neurons in a brain basically have perfect harmony, at some level, through a kind of division of labor — in which new ideas and different attention are an essential part of the division of labor.
Uniformity in a brain implies a low effective bit rate, and thus a low ability to cope effectively with complex challenges.
I am worried that we ALREADY having problems with dealing with constructive diversity in our society, growing problems in some ways.
And finally, getting back to the more near-term and practical, I have to ask: What do you think society could be doing now to better militate against existential risks?
For threat number one — basically, avoiding the nuclear end game — society could be doing many, many things, some of which I talk about in some detail at www.werbos.com/energy.htm.
For the other threats — I would be a lot more optimistic if I could see pathways to society really helping.
One thing that pops into my mind would be the possibility of extending the kind of research effort that NSF supported under COPN, which aimed at really deepening our fundamental understanding, in a cross-cutting way, avoiding the pitfalls of trying to build Terminators or of trying to control people better with wires onto their brains.
I have often wondered what could be done to provide more support for real human sanity and human potential (which is NOT a matter of ensuring conformity or making us all into a reliable workforce). I don’t see one, simple magical answer, but there is a huge amount of unmet potential there.
www.stateofthefuture.org talks a lot about trying to build more collective intelligence … trying to create streams of dialogue (both inner and outer?), where we can actually pose key questions and stick with them, with social support for that kind of thinking activity.
Yes, I can certainly see the value in that – more and more collective intelligence, moving us toward a coherent sort of Global Brain capable of dealing with species-level challenges, can certainly help us mitigate existential risk. Though exactly how much it can help seems currently difficult to say.
Well, thanks very much for your responses – and good luck to you in all your efforts!