Mitigating the Risks of Artificial Superintelligence

“Existential risk” refers to the risk that the human race as a whole might be annihilated.  In other words: human extinction risk, or species-level genocide.  This is an important concept because, as terrible at it would be if 90% of the human race were annihilated, wiping out 100% is a whole different matter.

Existential risk is not a fully well defined notion, because as transhumanist technologies advance, the border between human and nonhuman becomes increasingly difficult to distinguish.  If humans somehow voluntarily “transcend” their humanity and become superhuman, this seems a different sort of scenario than everyone being nuked to death.  However, philosophical concerns aside, there are sufficiently many clear potential avenues to human extinction to make the “existential risk” concept valuable — including nanotech arms races, risks associated with unethical superhuman AIs, and more mundane risks involving biological or nuclear warfare.  While one doesn’t wish to approach the future with an attitude of fearfulness, it’s also important to keep our eyes open to the very real dangers that loom.

Michael Anissimov ranks among the voices most prominent and effective in discussing the issue of existential risk, along with other issues related to the Singularity and the future of humanity and technology.   Currently the Media Director for the Singularity Institute, as well as a Board member of Humanity+, Michael is Co-Organizer of the Singularity Summit and a member of the Center for Responsible Nanotechnology’s Global Task Force.   His blog Accelerating Future is deservedly popular, featuring in-depth discussion of many important issues related to transhumanism.

The following quote summarizes some of Michael’s high-level views on existential risk:

I cannot emphasize this enough. If an existential disaster occurs, not only will the possibilities of extreme life extension, sophisticated nanotechnology, intelligence enhancement, and space expansion never bear fruit, but everyone will be dead, never to come back. This would be awful. Because we have so much to lose, existential risk is worth worrying about even if our estimated probability of occurrence is extremely low.

Existential risk creates a ‘loafer problem’ — we always expect someone else to handle it. I assert that this is a dangerous strategy and should be discarded in favor of making prevention of such risks a central focus.

In this dialogue I aimed to probe a little deeper, getting at Michael’s views on the specific nature of the risks associated with specific technologies (especially AI), and what we might do to combat them.   I knew this would be an interesting interview, because I’d talked informally with Michael about these ideas a few times before, so I knew we had many areas of disagreement along with broad areas of concurrence.   So the interview veers from vehement agreement into some friendly debate – I hope you’ll enjoy it!


What do you think is a reasonable short-list of the biggest existential risks facing humanity during the next century?


1.      Unfriendly AI.

2.      Selfish uploads.

3.      Molecular manufacturing arms race.


What do you think are the biggest *misconceptions* regarding existential risk — both among individuals in the futurist community broadly conceived; and among the general public….


Underestimating the significance of superintelligence.  People have a delusion that humanity is some theoretically optimum plateau of intelligence (due to brainwashing from Judeo-Christian theological ideas, which also permeate so-called “secular humanism”), which is the opposite of the truth.  We’re actually among the stupidest possible species smart enough to launch a civilization.


One view on the future of AI and the Singularity is that there is an irreducible uncertainty attached to the creation of dramatically greater than human intelligence.  That is, in this view, there probably isn’t really any way to eliminate or drastically mitigate the existential risk involved in creating superhuman AGI. So, in this view, building superhuman AI is essentially plunging into the Great Unknown and swallowing the risk because of the potential reward (where the reward may be future human benefit, or something else like the creation of aesthetically or morally pleasing superhuman beings, etc.).  Another view is that if we engineer and/or educate our AGI systems correctly, we can drastically mitigate the existential risk associated with superhuman AGI, and create a superhuman AGI that’s highly unlikely to pose an existential risk to humanity.  What are your thoughts on these two views?  Do you have an intuition on which one is more nearly correct?  (Or do you think both are wrong?)  By what evidence or lines of thought is your intuition on this informed/inspired?


Would you rather your AI be based on Hitler or Gandhi?


Can I vote for a random member of Devo instead?


My point is, if you have any preference, that proves you understand that there’s some correlation between a seed AI and the Singularity it grows into.

Imagine that AGI were impossible.  Imagine we would have to choose a human being to become the first superintelligence. Say that we knew that that human would acquire power that put her above all others — say, she had the guaranteed ability to charm and brainwash everyone she came into contact with, and direct them to follow her commands. If that had to be the case, then I would advise that we choose someone with as much innate kindness and cleverness as possible. Someone that really cared for humanity as a whole, and had an appreciation for abstract philosophical and moral issues. Someone that was mostly selfless, and understood that moral realism is false.  Someone who followed the axioms of probability theory in their reasoning — someone who systematically makes accurate probability estimates, rather than demonstrating  overconfidence, underconfidence, or framing biases.

This is the future of the entire light cone we’re talking about — the  Galaxy, the Virgo Cluster, and beyond.  Whether we like it or not, many think it’s likely that the first superintelligence would become a singleton, implicitly taking responsibility for the future development of civilization from that point on.  Now, it may be that we feel emotional aversion to the idea of a singleton, but this doesn’t alter the texture of the fitness landscape.  The first superintelligence, may, in fact, be able to elevate itself to singleton status quite quickly. (Say, through rapid self-replication or perhaps rapid replication of its ideas.)  If it can, then we have to do our best to plan for that eventuality, whether or not we personally like it.


By the way, I wonder how you define a “singleton”?

I’m personally not sure the concept even applies to radically non-human AI systems.  The individual-versus-group dichotomy works for us minds that are uniquely associated with specific physical bodies with narrow inter-body communication bandwidth, but will it hold up for AGIs?


Nick Bostrom covered this in “What is a Singleton?”:

In set theory, a singleton is a set with only one member, but as I introduced the notion, the term refers to a world order in which there is a single decision-making agency at the highest level.  Among its powers would be (1) the ability to prevent any threats (internal or external) to its own existence and supremacy, and (2) the ability to exert effective control over major features of its domain (including taxation and territorial allocation).  …

A democratic world republic could be a kind of singleton, as could a world dictatorship. A friendly superintelligent machine could be another kind of singleton, assuming it was powerful enough that no other entity could threaten its existence or thwart its plans. A “transcending upload” that achieves world domination would be another example.

The idea is around a single decision-making agency.  That agency could be made up of trillions of sub-agents, as long as they demonstrated harmony on making the highest level decisions, and prevented Tragedies of the Commons.  Thus, a democratic world republic could be a singleton.


Well the precise definition of “harmony” in this context isn’t terribly clear to me either.  But at a high level, sure, I understand — a singleton is supposed to have a higher degree of unity associated with its internal decision-making processes, compared a non-singleton intelligent entity….

I think there are a lot of possibilities for the future of AGI, but I can see that a singleton AGI mind is one relatively likely outcome – so we do need to plan with this possibility in mind.


Yes, and it’s “conservative” to assume that artificial intelligence will ascend in power very quickly, for reasons of prudence. Pursuit of the Singularity should be connected with an abundance of caution. General intelligence is the most powerful force in the universe, after all.

Human morality and “common sense” are extremely complex and peculiar information structures.  If we want to ensure continuity between our world and a world with AGI, we need to transfer over our “metamorals” at high fidelity.  Read the first chapter of Steven Pinker’s How the Mind Works to see what I’m getting at.  As Marvin Minsky said, “Easy things are hard!”  “Facts” that are “obvious” to infants would be extremely complicated to specify in code.  “Obvious” morality, like “don’t kill people if you don’t have to” is extremely complicated, but seems deceptively simple to us, because we have the brainware to compute it intuitively.  We have to give AGIs goal systems that are compatible with our continued existence, or we will be destroyed.  Certain basic drives common across many different kinds of AGIs may prove inconvenient to ourselves when the AGIs implementing them are extremely powerful and do not obey human commands.

To quote the co-founder of the World Transhumanist Association, Nick Bostrom:

The option to defer many decisions to the superintelligence does not mean that we can afford to be complacent in how we construct the superintelligence. On the contrary, the setting up of initial conditions, and in particular the selection of a top-level goal for the superintelligence, is of the utmost importance. Our entire future may hinge on how we solve these problems.

Words worth taking seriously… we only have one chance to get this right.


This quote seems to imply a certain class of approaches to creating superintelligence — i.e. one in which the concept of a “top level goal” has a meaning.  On the other hand one could argue that humans don’t really have top-level goals, though one can apply “top level goals” as a crude conceptual model of some aspects of what humans do.  Do you think humans have top-level goals?  Do you think it’s necessary for a superintelligence to have a structured goal hierarchy with a top-level goal, in order for it to have a reasonably high odds of turning out positively according to human standards?  [By the way, my own AI architecture does involve an explicit top-level goal, so I don’t ask this from a position of being radically opposed to the notion…]  Giving an AGI a moral and goal system implicitly via interacting with it and teaching it in various particular cases and asking it to extrapolate, would be one way to try to transmit the complex information structure of human morality and aesthetics to an AGI system without mucking with top-level goals.  What do you think of this possibility?


Humans don’t have hierarchical, cleanly causal goal systems (where the desirability of subgoals derives directly from their probabilistic contribution to fulfilling a supergoal).  Human goals are more like a network of strange attractors, centered around sex, status, food, and comfort.

It’s desirable to have an AI with a clearly defined goal system at the top because 1) I suspect that strange attractor networks converge to hierarchical goal systems in self-modifying systems, even if the network at the top is extremely complex; 2) such a goal system would be more amenable to mathematical analysis and easier to audit.

A hierarchical goal system could produce a human-like attractor network to guide its actions if it judged that to be the best way to achieve them, but an attractor network is doomed to an imprecise approach until it crystallizes a supergoal.  It’s nice to have the option of a systematic approach to pursuing utility, rather than being necessarily limited to an unsystematic approach.  I’m concerned about the introduction of randomness because random changes to complex structures tend to break those structures.  For instance, if you took out a random component of a car engine and replaced it with a random machine, the car would very likely stop functioning.

My concern with putting the emphasis on teaching rather than a clear hierarchical goal system to analyze human wishes is the risk of overfitting. Most important human abilities are qualities that we are either born with or not, like the ability to do higher mathematics. Teaching, while important, seems to be more of an end-stage tweaking and icing on the cake than the meat of  human accomplishment.  Of course, relative to other humans, because we all have similar genetics, training seems to matter a lot, but in the scheme of all animals, our unique abilities are mostly predetermined during development of the embryo. There’s a temptation to over-focus on teaching rather than creating deep goal structure because humans are dependent on teaching  one another. If we had direct access to our own brains, however, the emphasis would shift very much to determining the exact structure during development in the womb, rather than teaching after most of the neural connections are already in place.

To put this another way: a person born as a psychopath will never become benevolent, no matter the training. A person born highly benevolent would have to be very intensely abused to become evil. In both cases, the inherent neurological dispositions are more of a relevant factor than the training.


One approach that’s been suggested, in order to mitigate existential risks, is to create a sort of highly intelligent “AGI Nanny” or “Singularity Steward.”  This would be a roughly human-level AGI system without capability for dramatic self-modification, and with strong surveillance powers, given the task of watching everything that humans do and trying to ensure that nothing extraordinarily dangerous happens.

One could envision this as a quasi-permanent situation, or else as a temporary fix to be put into place while more research is done regarding how to launch a Singularity safely.

What are your views on this AI Nanny scenario?  Plausible or not?  Desirable or not?  Supposing the technology for this turns out to be feasible, what are the specific risks involved?


I’d rather not endure such a scenario. First, the name of the scenario is too prone to creating biases in appraisal of the idea. Who wants a “nanny”?  Some people would evaluate the desirability of such a scenario merely based on the connotations of a nanny in all-human society, which is stupid. We’re talking about qualitatively new kind of agent here, not something we can easily understand.

My main issue with the idea of an “AI Nanny” is that it would need to be practically Friendly AI-complete anyway. That is, it would have to have such a profound understanding of and respect for human motivations that you’d be 99% of the way to the “goal” with such an AI anyway. Why not go all the way, and create a solution satisfactory to all, including those who are paranoid about AI nannies?

Since specifying the exact content of such a Nanny AI would be extremely difficult, it seems likely that whatever extrapolation process that could create such an AI would be suitable for building a truly Friendly AI as well.  The current thinking on Friendly AI is not  to create an AI that sticks around forever, but  merely a stepping stone to a process that embodies humanity’s wishes.  The AI is just an “initial dynamic” that sticks around long enough to determine the coherence between humanity’s goals and implements it.

The idea is to create an AI that you actually trust. Giving control over the world to a Nanny AI would be a mistake, because you might never be able to get rid of it. I’d rather have an AI that is designed to get rid of itself once its job is done. Creating superintelligence is extremely dangerous, something you only want to do once.  Get it right the first time.

I’m not sure how plausible the scenario is, it would depend upon the talents of the programmer. I’m concerned that it would be possible. I think it’s very likely that if we take stupid shortcuts, we’ll regret it. Some classes of AI might be able to keep us from dying indefinitely, under conditions we find boring or otherwise suboptimal. Imagine a civilization frozen with today’s people and technology forever.  I enjoy the present world, but I can imagine it might get boring after a few thousand years.


Hmmm….  You say “an “AI Nanny” is that it would need to be practically Friendly AI-complete anyway.”  Could you justify that assertion a little more fully?  That’s not so clear to me.  It seems that understanding and respecting human motivations is one problem; whereas maintaining one’s understanding and goal system under radical self-modification using feasible computational resources is another problem.  I’m not sure why you think solution of the first problem implies being near to the s solution of the second problem.


“AI Nanny” implies an AI that broadly respects humans in ways that do not lead to our death or discomfort, but yet restricts our freedom in some way. My point is that if it’s already gone so far to please us, why not go the full length and give us our freedom? Is it really that impossible to please humans, even if you have more computing power and creativity at your disposal than thousands of human races?

The solution of the first problem implies being near to the second problem because large amounts of self-modification and adjustment would be necessary for an artificial intelligence to respect human desires and needs to begin with. Any AI sophisticated enough to do so well will already have engaged in more mental self-modifications than any human being could dream of. Prepping an AI for open-ended self-improvement after that will be an additional challenging task, I’m not saying that it wouldn’t be, but I don’t think it would be so much more difficult than an “AI Nanny” would offer an attractive  local maxima.

I’m worried that if we created an AI Nanny, we wouldn’t be able to get rid of it. So, why not create a truly Friendly AI instead, one that we can trust and provides us with long-term happiness and satisfaction as a benevolent partner to the human race? Pretty simple.

If we had a really benevolent human and an uploading machine, would we ask them to just kickstart the Singularity, or have them be a Nanny first?  I would presume the former, so why would we ask an AI to be a nanny? If we trust the AI like a human, it can do everything a human can do, and it’s the best available entity to do this, so why not let it go ahead and enhance its own intelligence in an open-ended fashion? If we can trust a human then we can trust an intelligently built friendly AGI even more.

I suspect that by the time we have an AI smart enough to be a nanny, it would be able to build itself MNT computers the size of the Hoover Dam, and solve the problem of post-Nanny AI.


The disconnect between my question and your answer seems to be that I think a Nanny AI (without motivation for radical self-modification) might be much easier to make than a superintelligence which keeps its goals stable under radical self-modification (and has motivation for radical self-motivation).  Yeah, if you think the two problems are roughly of equal difficulty, I see why you’d see little appeal in the Nanny AI scenario.


Yes, there’s the disagreement. I’d be interested in reading your further arguments for why one is so much harder than the other, or why the AI couldn’t make the upgrade to itself with little human help at that point.


Why do I think a Nanny AI is easier than  a superintelligent radically self-modifying AGI?  All a Nanny AI needs to do is to learn to distinguish desirable from undesirable human situations (which is probably a manageable supervised classification problem), and then deal with a bunch of sensors and actuators distributed around the world in an intelligent way.  A super-AI on the other hand has got to deal with situations much further from those foreseeable or comprehensible by its creators, which poses a much harder design problem IMO…


Again, overfitting. Perhaps it’s desirable to me to risk my life walking a tightrope, an intelligently designed Nanny AI would be forced to stop me. The number of special cases is too extreme, it requires real understanding. Otherwise, why bother with a Nanny AI, why not create an AI that just fulfills the wishes of a single steward human? I’d rather have someone with real understanding in control than a stupid AI that is very powerful but lacks basic common sense and the ability to change or listen. If something is more powerful than me, I want it to be more philosophically sophisticated and benevolent than me, or I’m likely against it. (Many people seem to be against the idea of any agents more powerful than them, period, which is ironic because I don’t exactly see them trying to improve their power much either.)


Yeah, it requires real understanding — but IMO much less real understanding than maintaining goal stability under radical self-modification…

As to why to prefer a Nanny AI to a human dictator, it’s because for humans power tends to corrupt, whereas for AGIs it won’t necessarily.  And democratic human institutions appear probably unable to handle the increasingly dangerous technologies that are going to emerge in the next N decades…

Another point is that personally, as an AI researcher, I feel fairly confident I know how to make a nanny AI that could work.  I also know how to make an AGI that keeps its goal system stable under radical self-modification — BUT the catch is, this design (GOLEM) uses infeasibly much computational resources.  I do NOT know how to make an AGI that keeps its goal system stable under radical self-modification and runs using feasible computational resources, and I wonder if designing such a thing is waaaay beyond our current scientific/engineering/mathematical capability.


How on Earth could you be confident that you could create a Nanny AI with  your current knowledge?  What mechanism would you design to allow us to break the AI’s control once we were ready? Who would have control of said mechanism?


The AI would need to break its control itself, once it was convinced we had a plausible solution to the problem of building a self-modifying AGI with a stable goal system.  Human dictators don’t like to break their own control (though it’s happened) but AIs needn’t have human motivations…


If the AI can make this judgment, couldn’t it build a solution itself?  An AI with the power to be a Nanny would have more cognitive resources than all human beings that have ever lived.


Very often in computer science and ordinary human life, *recognizing* a solution to a problem is much easier than actually finding the solution….  This is the basis of the theory of NP-completeness for example….  And of course the scientists who validated Einstein’s General Relativity theory mostly would not have been able to originate it themselves…

However, one quite likely outcome IMO is that the Nanny AI, rather than human scientists, is the one to find the solution to “radically self-modifying AGI with a stable goal system” … thus obsoleting itself 😉 ….  Or maybe it will be a collaboration of the  Nanny with the global brain of internetworked brain-enhanced humans … it should be fun to find out which path is ultimately taken!! …


OK, that last clause sounds vaguely similar to my SIAI colleague Eliezer Yudkowsky’s idea of “Coherent Extrapolated Volition”, which I’m sure you’re familiar with.  CEV, also, would start out gathering info and would only create the AI to replace itself once it was confident it did the extrapolation well. This would involve inferring the structure and function of human brains.


I admit the details of the CEV idea, as Eliezer explained it, never made that much sense to me.  But maybe you’re interpreting it a little differently.


CEV just means extrapolating what people want. We do this every day. Even salamander mothers do this for their children. The cognitive machinery that takes sense perceptions of an agent and infers what that agent wants from those sense perceptions is the same sort of machinery that would exist in CEV.


Hmmm…. That’s not exactly how I understood it from Eliezer’s paper….   What he said is:

our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

To me this seems a lot different than just “extrapolating what people want” …


The thing is that if all these qualifications were not here, extrapolation would lead to suboptimal outcomes. For instance, you must have made decisions for your children that were more in alignment with what they would want if they were smarter. If you made judgments in alignment with their actual preferences (like wanting to eat candy all day — I don’t know your kids but I know a lot of kids would do this), they would suffer for it in the longer term.

If extrapolations were made taking into account only our current knowledge, and not our knowledge if we knew more, really bad things could happen.

If extrapolations were made based on our human-characteristic thinking speeds, rather than the long-term equilibria of thinking that we would reach immediately if we thought faster, bad things could happen.

If extrapolations were made based on the people we are — often petty and under the control of short-term motivations, rather than who we wished we were, bad things could happen.

The same for each element above. I can understand why you might disagree with some of the above bullet points, but it’s hard to imagine how you could disagree with the notion of volition extrapolation in general.  It is a marvel of human intelligence and inference that no sometimes means yes and yes means no. An AI without a subtle extrapolation process will miss this entirely, and make choices for us that are too closely related to our current states, providing lock-in that we would never have chosen if we were superintelligences.

Salamanders extrapolate preferences. Human extrapolate preferences. Superintelligences will extrapolate preferences. Each new level of intelligence demands a higher level of wisdom for extrapolation. A superintelligence that uses human-level extrapolation algorithms to fulfill wishes would be a menace.


Hmmmm…..  Well, to me, the idea of “what Bob would want, if Bob were more of the person Bob wishes he was” is a bit confusing, because “the person Bob wishes he was” is a difficult sort of abstraction.  Bob doesn’t usually genuinely know what kind of person he wishes he was.  He may think he wishes he was an enlightened Zen master – and if he became an enlightened Zen master he might be quite contented that way – but yet the fact that he never took action to become that Zen master during his life in spite of many opportunities, still indicates that large parts of him didn’t really want that….  The notion of “the person you want to be” isn’t well-defined at all….

And looking at cases where different peoples’ wishes cohere is pretty dangerous too. For one thing you’d likely be throwing out your valued rationality, as that is certainly not something on which most peoples’ wishes cohere.  Belief in reincarnation is more likely to make it into the CEV of the human race than rationality.

And looking at the desires you would have if you were “more of the person you wish you were” is probably going to exaggerate the incoherence problem, not mitigate it.   Most religious people will wish they were even MORE religious and god-fearing, so I’d be less coherent with their ideals than I am with their actual selves…


“More like the person I wish I was” is not a difficult abstraction. I have  many desired modifications to my mental architecture, and I would prefer than an AI take that into account in its judgments. If Bob has dark thoughts at times, Bob wouldn’t want those dark thoughts to be integrated into the preference aggregation algorithm. It seems simple enough. Without this explicit precaution, said dark thoughts that Bob would choose to be excluded from the preference aggregator would be included anyhow.

The list of items in the definition of Coherent Extrapolated Volition is a way of saying to the AI, “take this into account too”. The alternative is to not take them into account. That seems bad, because these items obviously should be taken into account.


Hmmm… but I think separating out certain thoughts of Bob’s from the rest of his mind is not a very easy or well-defined task either.  The human mind is not a set of discretely-defined logical propositions; it’s a strangely-tangled web of interdefinitions, right?

You may not remember, but a couple years ago, in reaction to some of my concerns with the details of the CEV idea, some time ago I defined an alternative called Coherent Aggregated Volition (CAV).  The idea was to come up with a CEV-like idea that seemed less problematic.  Basically, CAV is about trying to find a goal that maximizes several criteria together, such as consistency, and matching closely on average to what a lot of people want, and compactness, and supported-ness by evidence.  I guess this fits within your broad notion of “extrapolation” but it seems rather different from CEV the way Eliezer stated it.


This is extremely undesirable, because the idea is not to average out our existing preferences, but to create something new that can serve as a foundation for the future. Similarity to existing gobses should not be a criterion. We are not trying to create a buddy but a Transition Guide, a massively powerful entity whose choices will de facto set the stage for our entire future light cone. The tone of this work, especially w/ regards to the language about the averaging of existing preferences, does not take the AI’s role as Transition Guide sufficiently into account.


Hmmm …. I just think the Transition Guide should start from where we are, not from where we (or our optimization algorithm) speculate we might be if our ideal were much smarter, etc….

I think we should provide a superhuman AI initially with some basic human values, not with some weird wacky far-out extrapolation that bears no noticeable resemblance to current human values….  Sure, a little extrapolation is needed, but only a little….

Still, I guess I can agree with you that “some idea in the vague vicinity of what you guys call “CEV” is probably valuable.  I could write a detailed analysis of why I think the details of Eli’s CEV paper are non-workable, but that would take a long day’s work, and I don’t want to put a day into it right now.  Going back and forth any further on particular points via email probably isn’t productive….

Perhaps one way to express the difference is that:

– CAV wants to get at the core of real, current human values, as manifested in real human life

– CEV wants to get at the core of “what humans would like their values to be”, as manifested in what we would like our life to be if we were all better people who were smarter and knew more

Does that feel right to you as a rough summary of the distinction?


Yes, the difference between CEV and CAV that you list makes sense.


I suppose what I’m getting at is – I think there is a depth and richness and substance to our current, real human values; whereas I think that “what we would like our values to be” is more of a web of fictions, not to be relied upon….

That is – to put it more precisely — suppose one used some sort of combination of CEV and CAV, i.e. something that took into account both current human values, and appropriately extrapolated human values.  And suppose this combination was confidence-weighted, i.e. it paid more attention to those values that were known with more confidence.  Then, my suspicion is that when the combination was done, one would find that the CAV components dominated the CEV components, because of the huge uncertainties in the latter…  But it seems you have a very different intuition on this…

But anyway … I can see this is a deep point that goes beyond the scope of an interview like this on!  Actually this is turning into more of a debate than an interview, which is good fun as well.  But, I think I’d better move on to my next question!

So here goes…  Another proposal that’s been suggested, to mitigate the potential existential risk of human-level or superhuman AGIs, is to create a community of AGIs and have them interact with each other, comprising a society with its own policing mechanisms and social norms and so forth.  The different AGIs would then keep each other in line.  A “social safety net” so to speak.  Steve Omohundro, for example, has been a big advocate of this approach.

What are your thoughts on this sort of direction?


Creating a community of AIs as just a way of avoiding the challenge of making an AI you trust.

Create an AI you trust, then worry about the rest. An AI that understands us. Someone we can call our friend, our ally. An agent really on our side. Then, the rest will follow. The key is not to see AI as an alien but as a potential friend. Necessarily regard AI as our enemy, and we will fail.

The universe is not fundamentally Darwinian. If the nice guy has all the weapons, all the control, then the thief and the criminal are screwed. We can defeat death. That’s an affront to Darwinian evolution if there ever was one.  We don’t need to balance AIs off against each other. We need a proxy to a process that represents what we want.

An AI is not like a human individual. A single AI could actually be legion. A single AI might split its awareness into thousands of threads as necessary. Watson does this all the time when it searches through many thousands of documents in parallel.

We don’t need to choose exactly what we want right away. We can just set up a system that leaves the option open in the future. Something that doesn’t lock us into any particular local maxima in the fitness space.

Eliezer nailed this question in 2001.  He really had his thumb right on it.  From the FAQ section of Creating Friendly AI:

Aren’t individual differences necessary to intelligence?  Isn’t a society necessary to produce ideas?  Isn’t capitalism necessary for efficiency?

Individual differences and the free exchange of ideas are necessary to human intelligence because it’s easy for a human to get stuck on one idea and then rationalize away all opposition. One scientist has one idea, but then gets stuck on it and becomes an obstacle to the next generation of scientists. A Friendly seed AI doesn’t rationalize.  Rationalization of mistaken ideas is a complex functional adaptation that evolves in imperfectly deceptive social organisms. Likewise, there are limits to how much experience any one human can accumulate, and we can’t share experiences with each other. There’s a limit to what one human can handle, and so far it hasn’t been possible to build bigger humans.

As for the efficiency of a capitalist economy, in which the efforts of self-interested individuals sum to a (sort of) harmonious whole:  Human economies are constrained to be individualist because humans are individualist. Local selfishness is not the miracle that enables the marvel of a globally efficient economy; rather, all human economies are constrained to be locally selfish in order to work at all. Try to build an economy in defiance of human nature, and it won’t work. This constraint is not necessarily something that carries over to minds in general.

Humans have to cooperate because we’re idiots when we think alone due to egocentric biases. The same does not necessarily apply to AGI. You can make an AGI that avoids egocentric biases from the get-go. People have trouble understanding this because they are anthropomorphic and find it impossible to imagine such a being.  They can doubt, but the empirical evidence will flood us from early experiments in infrahuman AI.  You can call me on this in 2020.


Hmmm….  I understand your and Eliezer’s view, but then some other deep futurist thinkers such as Steve Omohundro feel differently.  As I understand it, Steve feels that a trustable community might be easier to create than a trustable “singleton” mind.  And I don’t really think he is making any kind of simplistic anthropomorphic error.   Rather (I think) he thinks that cooperation between minds is a particular sort of self-organizing dynamic that implicitly gives rise to certain emergent structures (like morality for instance) via its self-organizing activity….

But maybe this is just a disagreement about AGI architecture — i.e. you could say he wants to architect an AGI as a community of relatively distinct subcomponents, whereas you want to architect it with a more unified internal architecture??


Possibly! Maybe the long-term outcome will be determined by which is easier to build, and my preferences don’t matter because one is just inherently more practical. Most successful AIs, like Watson and Google, seem to have unified architectures. The data-gathering infrastructure may be distributed but the decision-making, while probabilistic, is more or less unified.


One final question, then.  What do you think society could be doing now to better mitgate against existential risks … from AGI or from other sources?  More specific answers will be more fully appreciated 😉 …


Create a human-friendly superintelligence. The arguments for why this is a good idea have been laid out numerous times, and is the focus of Nick Bostrom’s essay “Ethical Issues in Advanced Artificial Intelligence”. An increasing majority of transhumanists are adopting this view.


Hmmm…. do you have any evidence in favor of the latter sentence?  My informal impression is otherwise, though I don’t have any evidence about the matter….  I wonder if your impression is biased due to your own role in the Singularity Institute, a community that certainly does take that view.  I totally agree with your answer, of course – but I sure meet a lot of futurists who don’t.


I mostly communicate with transhumanists that are not already Singularitarians because I am an employee that interfaces with the outside-of-SIAI community. I also have gotten hundreds of emails from media coverage over the past two years. If anything my view is biased towards transhumanists in the Bay Area, of which there are many, but not necessarily transhumanists with direct contact with those who already advocate friendly superintelligence.


Very interesting indeed.  Well I hope you’re right that this trend exists, it’s good to see the futurist community adopting a more and more realistic view on these issues, which as you point out are as important as anything on the planet these days.


  1. I think it’s important that AGI’s feel socially integrated with human heritage; that they are us. In the long run we will be them, so I hope there is a continuity of moral values.

    Our brains are an interesting amalgam of old and new parts, and right now both are needed to be a human. Specifically, people with injury to their prefrontal cortex lose self control (e.g., gluttonous, abusive to others, etc) while injury to the “older” structures for emotion are common in sociopaths and mass murderers. The emergent behavior of individuals is an interplay of neurological and social factors. (BTW, check out the works of Jonathan Haidt; I found them very informative on human nature.) I mention all this because I hope the dominant AGI forms of the future possess hard-wired compassion towards others.

    I hope we can make the singularity an evolution event (with backward compatibility) instead of an extinction/replacement event.

  2. 1. So much blabbing and so little actual work.

    2. How is it ever ethical to experiment with a consciousness?

  3. “We only have one chance to get this right” seems like a pretty misleading summary of the idea that we must not make any mistakes that can’t be recovered from. It is misleading in the direction of making the future sound dangerous and scary. ISTM that there are a lot of things like that – it is a kind of systematic alarmism.

  4. Are we trying to create our own God ?

    But, what if mankind already lives under the umbrella of one strong diffuse singleton ? This might be the profit making economy that is forced upon any human beings which brings enormous wealth and power to just a few. Are we really free to opt out of this world of profit mongers which is on the verge of destroying itself completely ? The way some impose embargos to Cuba allows us the answer the question by a strong NO.

    Creating benevolent superintelligence of any kind would lead to a major confrontation with our economic system which is biased and based on greed, and takes its roots in our ego. Ultimately, the BS may decide to eradicate the notion of profit for the good of humanity because misery is the outcome and humanity is about to face a threat. Major threat to humanity have already been identified such as Climate Change which may exterminate the whole of mankind. The application of a benevolent superintelligence (which may take the place of God) will surely lead to dramatic changes in mankind against its corrupted and egoistic will. As it is today, a large number of people would fall in denial and fight the common good selected by this higher authority. People’s freedom would be at stake even if they do not realise that they are being manipulated when they wish to apply their freewill, refuse the changes and keep status quo because it suits them all too well, incapable of seeing the big picture. Then, the benevolent superintelligence would need to bring very effective arrests, neutralisation and punishment of those who do not respect the new rules applied for their own good.

    The benevolent superintelligence may also react by removing each person’s right to liberty on the eletist grounds that man cannot help himself lifting himself above the mediocre and greedy considerations of self-satisfactions through pleasures which are not leading to happiness. Its is very easy to observe that, in our modern day societies, we live in scenarii where we are about to do enormously dramatic mistakes, beyond reparation, since nobody can really show the courage to tackle with human race risks to disappear.

    Chances are that the main action a benevolent superintelligence will be to cancel manking ridiculous arms races leading this world to complete chaos since most of our economies are based on arm construction and selling. This would mean large economic conversions where people would have to adapt to dramatic changes.

    How would it deal with the major problem that we all face within our agencies: corruption which is reached levels which are making some countries completely useless ? Knowing that people using corruption are also capable to revert to organised crime to further impose their powers.

    Hence, the people in power would not allow existence of such benevolent superintelligence and prefer one which is conservative, meaning helping them to keep and maintain their positions, such as is shown is 1984 or Brave New World. They have killed God and they are much more likely to kill anything good which may serve the common good of humanity and apply the necessary measures to bring humanity in good direction. I am writing these lines on Good Friday on purpose, since they already have killed the Messiah which had this purpose.

    The twisted or even bad superintelligence is much more likely to be leading to a totalitarian regime of titanic proportion slowly drugging humans to keep them away from liberty, theology and philosophy leading them on the path of happiness on the grounds that humans would require liberation from all these oppressing powers including going into resistance. In the past and even today, some religions were and are manipulated by some authorities to maintain a subtle application of a strong superintelligence which control the masses and where integrists are always available to apply punishment of the infidels to the ultimate authority. After WWII, it appears that Christianity has freed itself from political corruption, but the others have not yet.

    In order to do so more effectively, the people’s opium has shifted from religion to the media. This whole system may resemble a form as we see it in the movie Matrix but not allowing any disobedience by severe punishment.

    But what if there is a true God? What if it is benevolent? And is trying to communicate with mankind? Nothing would prevent the new superintelligence to be capable to create bonds with a God or even ET life forms since any higher level of intelligence would gain some kind of consciousness and freedom, meaning that its relationship with a God or even other lifeforms may completely happen without mankind being aware of it as we do not know if the whales or dolphins have any notion of a God or not.

  5. The discussion is a worthwhile conversation starter, but I must say I was left disappointed by the lack of understanding of complex systems. I think it is time for your community to acknowledge that we are not anymore dealing with the creation of AGI, but with operations of emerging AGI. We need to refresh the problem statements.

    The border of perceived randomness in natural processes has been continuously pushed back by better predictive models. The notion of random processes and probability are good tools for building initial heuristic models, but there is no formidable evidence of truly random processes in the Universe. These notions exist due to our (and future AGIs) information collection and processing limitations.

    During the last decade, network theory has proposed useful models for interpreting complex systems and making distinctions from random processes. For example in the light of interaction networks, naive examples like car engine failures can be analyzed for robustness against random node failures. The small world and scale-free sections of social processes gives us a completely different perspective into the robustness of intelligent interaction networks.

    More practically oriented industrial applications have started utilizing interesting heuristical approaches to understanding network dynamics through model ensembles. The predictive power of such ensembles are already today past human comprehension. For example in high-frequency trading field, humans are slowly, but surely being relegated to containing the consequences of algorithm interactions. The perceived borders of randomness have moved beyond human capabilities.

    In my view, the growing interaction of specialized AIs of today is the bootstrapping phase of AGI. The learning layer (e.g. averaging/bagging/boosting in it’s simplest and most local forms) built on top is merging human computational capabilities together with machines. The resulting system of systems is controlled by a continuous evolution of heuristic models into formal models. This layer is already today beyond human comprehension and we participate mainly as crowdsourcing resource. I think by the time humanity will be able to develop concepts to interpret planetary scale consciousness, values, behavior and intelligence, we will be surprised by the reality. Collective intelligence leads to emergent behavior with different values than those of individuals.

    To sum this up. I think AI community is in a dire need of a wakeup call from complex systems community. We are not in control, even though we have impact. There is no such thing as human-friendly AGI, but just a system where we are a part of. The role you have in the system will depend on your historic capabilities.

  6. Michael’s blog has been listed as “dangerous” for approximately the past two months. I asked my anti-virus software vendor to check if this was a “false positive” and they said Michaels’ blog is dangerous. I also emailed michael alerting him to the dangerous nature of his blog but he hasn’t removed or solved the infection and he didn’t deny that it exists.

    My point is this: if Michael cannot remove dangers from his own blog how can we have confidence regarding Michael’s suggestions regarding protections from dangerous threats to our existence?

    Undoubtedly researchers in the fields of AI, biotech, nanotech, and robotics already have safety procedures to ensure no existential threats occur. Government laws regulate research to ensure compliance with safely standards.

    Ironically regarding the so-called “loafer problem” maybe Micheal expects other people to protect themselves from his site?

    The “Existential Threats crowd” seem to have a disproportionate obsession with fear. They are disproportionately fearful regarding the future. I think such overly fearful people could very easily create the future they fear via their fears. The concept of Self-Fulfilling Prophecy reveals how people can easily make their fears come true.

    This is what my anti-virus software says when it scans Michael ‘s blog:

    Dangerous: This page contains active threats.
    Risk Category: Exploit server
    Risk Name:
    Scanned on: 04/22/11 06:59:37
    (0.00 seconds to scan this page)
    Ratings are provided by AVG. Site owners please contact AVG for questions.

    • Fears …

      You fear Altitude, because you die if you jump in the void : and your speed will be exponential : then you crash

      A transhumanist being is not a dead being

      A living smart being will try to take the bridge ( remember nietzsche ? ) , or you can even take a parachute

      the metaphor is maybe near what is happening : you can jump from the space station, and with a parachute : and IF you open your parachute before your speed is too fast … you CAN SURVIVE

      Orelse …

      • Welcome back on earth

      • Fears of altitude is genetic, species learn that this is not something to do

        • People who don’t fear Altitude a little, are blind , blind young being, or MAD

          • Excessive fear can easily become irrational phobia, paranoia, anxiety disorder.

            Fears are crucial for health but excessive fear is detrimental to health. I think the level of fear regarding “existential threats” has definitely reached a level where it could be classified as an irrational mental disorder. I stated earlier that: “Undoubtedly researchers in the fields of AI, biotech, nanotech, and robotics already have safety procedures to ensure no existential threats occur. Government laws regulate research to ensure compliance with safety standards.”

            My point is that safety standards already exist and are implemented therefore we don’t need an excessive paranoiac focus on threats.

            • The biggest treat to human being is human being

              Watch USA, maybe you country, people die from poverty, and violence 1% of people in prison ( this is concentration camp man : open your eyes )

              The dream you live in : is dangerous

              400 people in usa has more than 150 million people in USA

              OK ?

              Take the money of one, of these 400 people : and maybe if you choose well : you can become the god you want to be, the monster you want to be, and do whatever you want to other being.

              I am less afraid of smart AGI , than A rich man for the oligarchy who become the first uploaded guy.

              Don’t you see : people with power, want a pyramidal elitist transhumanism …

              ( and you know : maybe “industrial revolution,” and public education : was just for this GOAL )

              A pyramidal singularity means 6 billion human dead, a networked singularity means 6 billion human more open minded

              If you cannot explain WHY THERE IS FOOD FOR EVERYBODY, AND PEOPLE DIES : IN rich countries ? Why there is poverty in rich countries ? Why ?
              Because there is no more jobs because of technology ? YES BUT WHY ? DO HUMAN BEING CONSUMMERS DON’T GET WHAT THEY NEED ?

              You have a serious problem man…

              A world of sharing must replace the sharing of the world

              At one point : you need consummer to consumme : orelse

              THE END : no more economy

              CHAOS for the poor

              • Dear Eric K, I am already very aware of the great danger humans present regarding the future therefore we should NOT underestimate the possibility that humans could actually create, via a Self-fulfilling prophecy, the existential threat which they fear. The fear is often the most dangerous thing.

                It has been said that we have not to fear but fear itself.

                I am also very aware of the hardships created via poverty thus over the past 12 months I’ve been trying to raise awareness of Post-Scarcity, with a view to creating changes in civilization.

                By the time human-uploading is possible, Post-Scarcity will also be possible thus greed will be irrelevant, furthermore stupidity which motivates greed will be obsolete in the future.

                The reason why people starve despite the current abundance of food has three factors:

                1. Food is not yet superabundant.
                2. The current abundance of food is a relatively new phenomenon due to modern farming.
                3. People are hardwired for greed because during times of extreme scarcity greed was an essential survival trait, thus relinquishing our hardwired greed mentality is a slow process. People cannot fully put their trust in abundance because although technology is very accomplished we are not yet experiencing the total resilience of Post-Scarcity.

                There is a BIG difference between abundance and superabundance.

  7. It is worth pointing out that success probabilities matter a lot in considering what solutions to favor. Creating benevolent superintelligence of any kind is probably significantly harder than we may think. If there are differences in success vs. failure probabilities of various approaches, expected value considerations should take these seriously, even if the projected outcomes lack certain features that we would want to see in an absolutely ideal post-human future (e.g. the full range of contemporary human values).

    For instance, CEV/CAV seem like messy business with sufficient vagueness and potential failure modes to make me skeptical. It is not at all clear to me why anyone would, say, want to include the values of religioius fundamentalists into any kind of supergoal for future decision-making entities. Or how to keep Homo Sapien’s notorious in-group/out-group dynamics out of it. If you rely on vague conditions like “if we knew more” or “if we had grown up farther together” in order to filter all of humanity’s malevolence and stupidity out, you may just as well formulate an abstract value theory, let the AI refine it conceptually within certain bounds, and hard-code the result as a supergoal.

    Compare the success probability of creating a paperclip maximizer with the success probability of creating reliable benevolence out of CEV/CAV. It seems to me that a paperclip maximizer is a lot easier to construct and a lot more predictable in the world-states that it will try to achieve.

    Now replace the paperclips with something that actually has utilitarian value. Orgasmium is a straightforward concept, but it is lacking a crucial ingredient in human values: Diversity and complexity. Ultimately, if you strip away illusionary philosophical concepts (the self, free will…), you can simplify human values to a degree that makes them… measurable and reproducable. More complex than paperclips, but still something that a pre-specified algorithm based on a rational value theory can handle. It seems to me that all actual human values boil down to abstract components such as sentient affect, (inter-)personal narrative, and diversity of experience.

    Why not create a “deluxe orgasmium” maximizer that facilitates these values directly? Imagine a propagation system that transforms resources into sentient minds who experience positive affect (pleasure, happiness etc.) in a permutation of diversity (including aspects like communication, sensory multi-modality, subjective freedom) and narrative without actually allowing free memetic evolution or decision-making of these minds. The state-space of created observer moments is diverse and vast, but free from below-zero affect (i.e. suffering), and basically pre-defined algorithmically.

    The obvious advantages here are a de-coupling of sentient experience from how the system is organized, and the predictability of its behavior, resulting in a higher success probability (remember how high the stakes are here!).

    The system can be highly efficient and expand into space, like a paperclip maximizer would – except that the paperclips are in fact sentient observer moments of rich hedonistic diversity, implemented in brains-in-vats or through sentient uploads.

    You may object because you value freedom, but freedom is an illusion anyway – the only freedom there ever has been is the subjective feeling of it. If you are under the illusion that you have a consciousness-bearing self that has token-identity through time, you may worry what would happen to you personally in such a system. Most people would hesitate to give up all of their personal power to a hedonic algorithm. As a compromise, all existing brains could be integrated into the matrix in a way that they can initially choose, i.e. they consent to the specific parameters of the hedonistic algorithm that will unfold in their brains, maintaining relationships to other brains in virtual worlds if they choose to.

    The key here is that the system’s evolution follows a pre-defined pattern that we can predict will have positive utility and remain stable in the very long run and on the very large scale. We neither want future wars nor involuntary suffering. And the success probability has to be as high as possible, which means we can’t use vague extrapolations like CEV/CAV to gamble astronomical amounts of utility on. It’s a gamble we might well lose, and the opportunity costs of good subjective life would be mind-boggling.

  8. @ mw

    The major problem I see with exclusively going the transhumanist route while neglecting or even abandoning the friendly AGI route is something (I think) Ben mentioned in this interview. People are inherently selfish with a sugarcoating of altruism, and if people get more power, they will usually use it primarily to meet their own ends. If we all become gradually transhuman, then we will have marvellous capabilities… but as long as our minds fundamentally stay the same we will probably also remain fairly selfish.

    Sure, if transhumanism takes off, the major branches of government will make use of that technology to enforce the law etc., so in a way it’s a red-queen situation for those who choose to upgrade – and thus we hopefully won’t see a spike in “transhuman crime”.

    Equally encouraging is the fact, that human civilization has made amazing moral progress. Our societies have proven to be marvellously adaptive in the sense, that nowadays we can only shake our heads in horror and disgust when we read about the inhuman and barbaric customs of more primitive times. (Torture, Superstition, Genocide, Inquisition, Crusades, Slavery).
    Compared to the last centuries we’ve made remarkable social and moral adaptations for a species, that is evolutionarily/genetically speaking largely the same as 50.000 years ago.

    Still, when I think about transhuman futures where we only use fairly primitive AIs to augment ourselves and our selfsh desires, I fear we will stay far less than we could be. We’ll remain transhuman with all the problems of our humanity, whereas AGI seems to be a direct route to posthumanism while cutting all the bullshit so to speak.

    My hope for AGI and posthumanism is the emergence of “real” empathy and altruism and understanding and deep connection. Being an animal – no matter how technologically augmented – is still shit.

  9. @ mw

    The major problem I see with exclusively going the transhumanist route while neglecting or even abandoning the friendly AGI route is something (I think) Ben mentioned in this interview. People are inherently selfish with a sugarcoating of altruism, and if people get more power, they will usually use it primarily to meet their own ends. If we all become gradually transhuman, then we will have marvellous capabilities… but as long as our minds fundamentally stay the same we will probably also remain fairly selfish.

    Sure, if transhumanism takes off, the major branches of government will make use of that technology to enforce the law etc., so in a way it’s a red-queen situation for those who choose to upgrade – and thus we hopefully won’t see a spike in “transhuman crime”.

    Equally encouraging is the fact, that human civilization has made amazing moral progress. Our societies have proven to be marvellously adaptive in the sense, that nowadays we can only shake our heads in horror and disgust when we read about the inhuman and barbaric customs of more primitive times. (Torture, Genocide, Inquisition, Religious wars, Slavery).
    Compared to the last centuries we’ve made remarkable social and moral adaptations for a species, that is evolutionarily/genetically speaking largely the same as 50.000 years ago.

    Still, when I think about transhuman futures where we use fairly primitive AIs to augment ourselves and our selfsh desires, I fear we will stay far less than we could be. We’ll remain transhuman with all the problems of our humanity, whereas AGI seems to be a direct route to posthumanism while cutting all the bullshit so to speak.

    My hope for AGI and posthumanism is the emergence of “real” empathy and altruism and understanding. Being an animal – no matter how technologically augmented – is still shit.

  10. whats with all the emphasis on apocalyptic thinking? I think I watched some vid a whiles back where some video game designer (the guy from civ?) on FUN AI. How about 20 interviews and rants about FUN AI? Also how about useful AI for sorting life’s complexities or making difficult things easier. or empowering individuals to do things that were once complex. or to help doing competitive things. something about aggregating human and machine intelligence for personal or family or business empowerment. it would be creepy for an outisde force to use our cell phone to tap all our conversation and use voice to text and locational information to farm metrics – but if it was a personal tool – it might be useful for feedback. this sort of datafeed through a personal ai coach could offer all sorts of help with bad impulses and negative spiraling patterns and etc. there’s tons of constructive directions for machine intelligence. focusing on “the singularity” and noiding on it gets tedious after a minute. there are many tools that eclipse human ability. but humans are a symbiotic system. there may be a competitive red queen type creep and tipping point related to agi tools but we get that now in all sorts of ways.

  11. A question for you two

    a) China’s goal is to create a AGI, and they are using super computer to do it.
    b) USA is in an AGI race with china

    do you hear something about AGI projet on super computer in usa , and BIG project in the BIG society in usa ?

    • I know it is not only a matter of computationnal power, but it helps

Leave a Reply