Sign In

Remember Me

Mitigating the Risks of Artificial Superintelligence

“Existential risk” refers to the risk that the human race as a whole might be annihilated.  In other words: human extinction risk, or species-level genocide.  This is an important concept because, as terrible at it would be if 90% of the human race were annihilated, wiping out 100% is a whole different matter.

Existential risk is not a fully well defined notion, because as transhumanist technologies advance, the border between human and nonhuman becomes increasingly difficult to distinguish.  If humans somehow voluntarily “transcend” their humanity and become superhuman, this seems a different sort of scenario than everyone being nuked to death.  However, philosophical concerns aside, there are sufficiently many clear potential avenues to human extinction to make the “existential risk” concept valuable — including nanotech arms races, risks associated with unethical superhuman AIs, and more mundane risks involving biological or nuclear warfare.  While one doesn’t wish to approach the future with an attitude of fearfulness, it’s also important to keep our eyes open to the very real dangers that loom.

Michael Anissimov ranks among the voices most prominent and effective in discussing the issue of existential risk, along with other issues related to the Singularity and the future of humanity and technology.   Currently the Media Director for the Singularity Institute, as well as a Board member of Humanity+, Michael is Co-Organizer of the Singularity Summit and a member of the Center for Responsible Nanotechnology’s Global Task Force.   His blog Accelerating Future is deservedly popular, featuring in-depth discussion of many important issues related to transhumanism.

The following quote summarizes some of Michael’s high-level views on existential risk:

I cannot emphasize this enough. If an existential disaster occurs, not only will the possibilities of extreme life extension, sophisticated nanotechnology, intelligence enhancement, and space expansion never bear fruit, but everyone will be dead, never to come back. This would be awful. Because we have so much to lose, existential risk is worth worrying about even if our estimated probability of occurrence is extremely low.

Existential risk creates a ‘loafer problem’ — we always expect someone else to handle it. I assert that this is a dangerous strategy and should be discarded in favor of making prevention of such risks a central focus.

In this dialogue I aimed to probe a little deeper, getting at Michael’s views on the specific nature of the risks associated with specific technologies (especially AI), and what we might do to combat them.   I knew this would be an interesting interview, because I’d talked informally with Michael about these ideas a few times before, so I knew we had many areas of disagreement along with broad areas of concurrence.   So the interview veers from vehement agreement into some friendly debate – I hope you’ll enjoy it!

Ben:

What do you think is a reasonable short-list of the biggest existential risks facing humanity during the next century?

Michael:

1.      Unfriendly AI.

2.      Selfish uploads.

3.      Molecular manufacturing arms race.

Ben:

What do you think are the biggest *misconceptions* regarding existential risk — both among individuals in the futurist community broadly conceived; and among the general public….

Michael:

Underestimating the significance of superintelligence.  People have a delusion that humanity is some theoretically optimum plateau of intelligence (due to brainwashing from Judeo-Christian theological ideas, which also permeate so-called “secular humanism”), which is the opposite of the truth.  We’re actually among the stupidest possible species smart enough to launch a civilization.

Ben:

One view on the future of AI and the Singularity is that there is an irreducible uncertainty attached to the creation of dramatically greater than human intelligence.  That is, in this view, there probably isn’t really any way to eliminate or drastically mitigate the existential risk involved in creating superhuman AGI. So, in this view, building superhuman AI is essentially plunging into the Great Unknown and swallowing the risk because of the potential reward (where the reward may be future human benefit, or something else like the creation of aesthetically or morally pleasing superhuman beings, etc.).  Another view is that if we engineer and/or educate our AGI systems correctly, we can drastically mitigate the existential risk associated with superhuman AGI, and create a superhuman AGI that’s highly unlikely to pose an existential risk to humanity.  What are your thoughts on these two views?  Do you have an intuition on which one is more nearly correct?  (Or do you think both are wrong?)  By what evidence or lines of thought is your intuition on this informed/inspired?

Michael:

Would you rather your AI be based on Hitler or Gandhi?

Ben:

Can I vote for a random member of Devo instead?

Michael:

My point is, if you have any preference, that proves you understand that there’s some correlation between a seed AI and the Singularity it grows into.

Imagine that AGI were impossible.  Imagine we would have to choose a human being to become the first superintelligence. Say that we knew that that human would acquire power that put her above all others — say, she had the guaranteed ability to charm and brainwash everyone she came into contact with, and direct them to follow her commands. If that had to be the case, then I would advise that we choose someone with as much innate kindness and cleverness as possible. Someone that really cared for humanity as a whole, and had an appreciation for abstract philosophical and moral issues. Someone that was mostly selfless, and understood that moral realism is false.  Someone who followed the axioms of probability theory in their reasoning — someone who systematically makes accurate probability estimates, rather than demonstrating  overconfidence, underconfidence, or framing biases.

This is the future of the entire light cone we’re talking about — the  Galaxy, the Virgo Cluster, and beyond.  Whether we like it or not, many think it’s likely that the first superintelligence would become a singleton, implicitly taking responsibility for the future development of civilization from that point on.  Now, it may be that we feel emotional aversion to the idea of a singleton, but this doesn’t alter the texture of the fitness landscape.  The first superintelligence, may, in fact, be able to elevate itself to singleton status quite quickly. (Say, through rapid self-replication or perhaps rapid replication of its ideas.)  If it can, then we have to do our best to plan for that eventuality, whether or not we personally like it.

Ben:

By the way, I wonder how you define a “singleton”?

I’m personally not sure the concept even applies to radically non-human AI systems.  The individual-versus-group dichotomy works for us minds that are uniquely associated with specific physical bodies with narrow inter-body communication bandwidth, but will it hold up for AGIs?

Michael:

Nick Bostrom covered this in “What is a Singleton?”:

In set theory, a singleton is a set with only one member, but as I introduced the notion, the term refers to a world order in which there is a single decision-making agency at the highest level.  Among its powers would be (1) the ability to prevent any threats (internal or external) to its own existence and supremacy, and (2) the ability to exert effective control over major features of its domain (including taxation and territorial allocation).  …

A democratic world republic could be a kind of singleton, as could a world dictatorship. A friendly superintelligent machine could be another kind of singleton, assuming it was powerful enough that no other entity could threaten its existence or thwart its plans. A “transcending upload” that achieves world domination would be another example.

The idea is around a single decision-making agency.  That agency could be made up of trillions of sub-agents, as long as they demonstrated harmony on making the highest level decisions, and prevented Tragedies of the Commons.  Thus, a democratic world republic could be a singleton.

Ben:

Well the precise definition of “harmony” in this context isn’t terribly clear to me either.  But at a high level, sure, I understand — a singleton is supposed to have a higher degree of unity associated with its internal decision-making processes, compared a non-singleton intelligent entity….

I think there are a lot of possibilities for the future of AGI, but I can see that a singleton AGI mind is one relatively likely outcome – so we do need to plan with this possibility in mind.

Michael:

Yes, and it’s “conservative” to assume that artificial intelligence will ascend in power very quickly, for reasons of prudence. Pursuit of the Singularity should be connected with an abundance of caution. General intelligence is the most powerful force in the universe, after all.

Human morality and “common sense” are extremely complex and peculiar information structures.  If we want to ensure continuity between our world and a world with AGI, we need to transfer over our “metamorals” at high fidelity.  Read the first chapter of Steven Pinker’s How the Mind Works to see what I’m getting at.  As Marvin Minsky said, “Easy things are hard!”  “Facts” that are “obvious” to infants would be extremely complicated to specify in code.  “Obvious” morality, like “don’t kill people if you don’t have to” is extremely complicated, but seems deceptively simple to us, because we have the brainware to compute it intuitively.  We have to give AGIs goal systems that are compatible with our continued existence, or we will be destroyed.  Certain basic drives common across many different kinds of AGIs may prove inconvenient to ourselves when the AGIs implementing them are extremely powerful and do not obey human commands.

To quote the co-founder of the World Transhumanist Association, Nick Bostrom:

The option to defer many decisions to the superintelligence does not mean that we can afford to be complacent in how we construct the superintelligence. On the contrary, the setting up of initial conditions, and in particular the selection of a top-level goal for the superintelligence, is of the utmost importance. Our entire future may hinge on how we solve these problems.

Words worth taking seriously… we only have one chance to get this right.

Ben:

This quote seems to imply a certain class of approaches to creating superintelligence — i.e. one in which the concept of a “top level goal” has a meaning.  On the other hand one could argue that humans don’t really have top-level goals, though one can apply “top level goals” as a crude conceptual model of some aspects of what humans do.  Do you think humans have top-level goals?  Do you think it’s necessary for a superintelligence to have a structured goal hierarchy with a top-level goal, in order for it to have a reasonably high odds of turning out positively according to human standards?  [By the way, my own AI architecture does involve an explicit top-level goal, so I don't ask this from a position of being radically opposed to the notion...]  Giving an AGI a moral and goal system implicitly via interacting with it and teaching it in various particular cases and asking it to extrapolate, would be one way to try to transmit the complex information structure of human morality and aesthetics to an AGI system without mucking with top-level goals.  What do you think of this possibility?

Michael:

Humans don’t have hierarchical, cleanly causal goal systems (where the desirability of subgoals derives directly from their probabilistic contribution to fulfilling a supergoal).  Human goals are more like a network of strange attractors, centered around sex, status, food, and comfort.

It’s desirable to have an AI with a clearly defined goal system at the top because 1) I suspect that strange attractor networks converge to hierarchical goal systems in self-modifying systems, even if the network at the top is extremely complex; 2) such a goal system would be more amenable to mathematical analysis and easier to audit.

A hierarchical goal system could produce a human-like attractor network to guide its actions if it judged that to be the best way to achieve them, but an attractor network is doomed to an imprecise approach until it crystallizes a supergoal.  It’s nice to have the option of a systematic approach to pursuing utility, rather than being necessarily limited to an unsystematic approach.  I’m concerned about the introduction of randomness because random changes to complex structures tend to break those structures.  For instance, if you took out a random component of a car engine and replaced it with a random machine, the car would very likely stop functioning.

My concern with putting the emphasis on teaching rather than a clear hierarchical goal system to analyze human wishes is the risk of overfitting. Most important human abilities are qualities that we are either born with or not, like the ability to do higher mathematics. Teaching, while important, seems to be more of an end-stage tweaking and icing on the cake than the meat of  human accomplishment.  Of course, relative to other humans, because we all have similar genetics, training seems to matter a lot, but in the scheme of all animals, our unique abilities are mostly predetermined during development of the embryo. There’s a temptation to over-focus on teaching rather than creating deep goal structure because humans are dependent on teaching  one another. If we had direct access to our own brains, however, the emphasis would shift very much to determining the exact structure during development in the womb, rather than teaching after most of the neural connections are already in place.

To put this another way: a person born as a psychopath will never become benevolent, no matter the training. A person born highly benevolent would have to be very intensely abused to become evil. In both cases, the inherent neurological dispositions are more of a relevant factor than the training.

Ben:

One approach that’s been suggested, in order to mitigate existential risks, is to create a sort of highly intelligent “AGI Nanny” or “Singularity Steward.”  This would be a roughly human-level AGI system without capability for dramatic self-modification, and with strong surveillance powers, given the task of watching everything that humans do and trying to ensure that nothing extraordinarily dangerous happens.

One could envision this as a quasi-permanent situation, or else as a temporary fix to be put into place while more research is done regarding how to launch a Singularity safely.

What are your views on this AI Nanny scenario?  Plausible or not?  Desirable or not?  Supposing the technology for this turns out to be feasible, what are the specific risks involved?

Michael:

I’d rather not endure such a scenario. First, the name of the scenario is too prone to creating biases in appraisal of the idea. Who wants a “nanny”?  Some people would evaluate the desirability of such a scenario merely based on the connotations of a nanny in all-human society, which is stupid. We’re talking about qualitatively new kind of agent here, not something we can easily understand.

My main issue with the idea of an “AI Nanny” is that it would need to be practically Friendly AI-complete anyway. That is, it would have to have such a profound understanding of and respect for human motivations that you’d be 99% of the way to the “goal” with such an AI anyway. Why not go all the way, and create a solution satisfactory to all, including those who are paranoid about AI nannies?

Since specifying the exact content of such a Nanny AI would be extremely difficult, it seems likely that whatever extrapolation process that could create such an AI would be suitable for building a truly Friendly AI as well.  The current thinking on Friendly AI is not  to create an AI that sticks around forever, but  merely a stepping stone to a process that embodies humanity’s wishes.  The AI is just an “initial dynamic” that sticks around long enough to determine the coherence between humanity’s goals and implements it.

The idea is to create an AI that you actually trust. Giving control over the world to a Nanny AI would be a mistake, because you might never be able to get rid of it. I’d rather have an AI that is designed to get rid of itself once its job is done. Creating superintelligence is extremely dangerous, something you only want to do once.  Get it right the first time.

I’m not sure how plausible the scenario is, it would depend upon the talents of the programmer. I’m concerned that it would be possible. I think it’s very likely that if we take stupid shortcuts, we’ll regret it. Some classes of AI might be able to keep us from dying indefinitely, under conditions we find boring or otherwise suboptimal. Imagine a civilization frozen with today’s people and technology forever.  I enjoy the present world, but I can imagine it might get boring after a few thousand years.

Ben:

Hmmm….  You say “an “AI Nanny” is that it would need to be practically Friendly AI-complete anyway.”  Could you justify that assertion a little more fully?  That’s not so clear to me.  It seems that understanding and respecting human motivations is one problem; whereas maintaining one’s understanding and goal system under radical self-modification using feasible computational resources is another problem.  I’m not sure why you think solution of the first problem implies being near to the s solution of the second problem.

Michael:

“AI Nanny” implies an AI that broadly respects humans in ways that do not lead to our death or discomfort, but yet restricts our freedom in some way. My point is that if it’s already gone so far to please us, why not go the full length and give us our freedom? Is it really that impossible to please humans, even if you have more computing power and creativity at your disposal than thousands of human races?

The solution of the first problem implies being near to the second problem because large amounts of self-modification and adjustment would be necessary for an artificial intelligence to respect human desires and needs to begin with. Any AI sophisticated enough to do so well will already have engaged in more mental self-modifications than any human being could dream of. Prepping an AI for open-ended self-improvement after that will be an additional challenging task, I’m not saying that it wouldn’t be, but I don’t think it would be so much more difficult than an “AI Nanny” would offer an attractive  local maxima.

I’m worried that if we created an AI Nanny, we wouldn’t be able to get rid of it. So, why not create a truly Friendly AI instead, one that we can trust and provides us with long-term happiness and satisfaction as a benevolent partner to the human race? Pretty simple.

If we had a really benevolent human and an uploading machine, would we ask them to just kickstart the Singularity, or have them be a Nanny first?  I would presume the former, so why would we ask an AI to be a nanny? If we trust the AI like a human, it can do everything a human can do, and it’s the best available entity to do this, so why not let it go ahead and enhance its own intelligence in an open-ended fashion? If we can trust a human then we can trust an intelligently built friendly AGI even more.

I suspect that by the time we have an AI smart enough to be a nanny, it would be able to build itself MNT computers the size of the Hoover Dam, and solve the problem of post-Nanny AI.

Ben:

The disconnect between my question and your answer seems to be that I think a Nanny AI (without motivation for radical self-modification) might be much easier to make than a superintelligence which keeps its goals stable under radical self-modification (and has motivation for radical self-motivation).  Yeah, if you think the two problems are roughly of equal difficulty, I see why you’d see little appeal in the Nanny AI scenario.

Michael:

Yes, there’s the disagreement. I’d be interested in reading your further arguments for why one is so much harder than the other, or why the AI couldn’t make the upgrade to itself with little human help at that point.

Ben:

Why do I think a Nanny AI is easier than  a superintelligent radically self-modifying AGI?  All a Nanny AI needs to do is to learn to distinguish desirable from undesirable human situations (which is probably a manageable supervised classification problem), and then deal with a bunch of sensors and actuators distributed around the world in an intelligent way.  A super-AI on the other hand has got to deal with situations much further from those foreseeable or comprehensible by its creators, which poses a much harder design problem IMO…

Michael:

Again, overfitting. Perhaps it’s desirable to me to risk my life walking a tightrope, an intelligently designed Nanny AI would be forced to stop me. The number of special cases is too extreme, it requires real understanding. Otherwise, why bother with a Nanny AI, why not create an AI that just fulfills the wishes of a single steward human? I’d rather have someone with real understanding in control than a stupid AI that is very powerful but lacks basic common sense and the ability to change or listen. If something is more powerful than me, I want it to be more philosophically sophisticated and benevolent than me, or I’m likely against it. (Many people seem to be against the idea of any agents more powerful than them, period, which is ironic because I don’t exactly see them trying to improve their power much either.)

Ben:

Yeah, it requires real understanding — but IMO much less real understanding than maintaining goal stability under radical self-modification…

As to why to prefer a Nanny AI to a human dictator, it’s because for humans power tends to corrupt, whereas for AGIs it won’t necessarily.  And democratic human institutions appear probably unable to handle the increasingly dangerous technologies that are going to emerge in the next N decades…

Another point is that personally, as an AI researcher, I feel fairly confident I know how to make a nanny AI that could work.  I also know how to make an AGI that keeps its goal system stable under radical self-modification — BUT the catch is, this design (GOLEM) uses infeasibly much computational resources.  I do NOT know how to make an AGI that keeps its goal system stable under radical self-modification and runs using feasible computational resources, and I wonder if designing such a thing is waaaay beyond our current scientific/engineering/mathematical capability.

Michael:

How on Earth could you be confident that you could create a Nanny AI with  your current knowledge?  What mechanism would you design to allow us to break the AI’s control once we were ready? Who would have control of said mechanism?

Ben:

The AI would need to break its control itself, once it was convinced we had a plausible solution to the problem of building a self-modifying AGI with a stable goal system.  Human dictators don’t like to break their own control (though it’s happened) but AIs needn’t have human motivations…

Michael:

If the AI can make this judgment, couldn’t it build a solution itself?  An AI with the power to be a Nanny would have more cognitive resources than all human beings that have ever lived.

Ben:

Very often in computer science and ordinary human life, *recognizing* a solution to a problem is much easier than actually finding the solution….  This is the basis of the theory of NP-completeness for example….  And of course the scientists who validated Einstein’s General Relativity theory mostly would not have been able to originate it themselves…

However, one quite likely outcome IMO is that the Nanny AI, rather than human scientists, is the one to find the solution to “radically self-modifying AGI with a stable goal system” … thus obsoleting itself ;) ….  Or maybe it will be a collaboration of the  Nanny with the global brain of internetworked brain-enhanced humans … it should be fun to find out which path is ultimately taken!! …

Michael:

OK, that last clause sounds vaguely similar to my SIAI colleague Eliezer Yudkowsky’s idea of “Coherent Extrapolated Volition”, which I’m sure you’re familiar with.  CEV, also, would start out gathering info and would only create the AI to replace itself once it was confident it did the extrapolation well. This would involve inferring the structure and function of human brains.

Ben:

I admit the details of the CEV idea, as Eliezer explained it, never made that much sense to me.  But maybe you’re interpreting it a little differently.

Michael:

CEV just means extrapolating what people want. We do this every day. Even salamander mothers do this for their children. The cognitive machinery that takes sense perceptions of an agent and infers what that agent wants from those sense perceptions is the same sort of machinery that would exist in CEV.

Ben:

Hmmm…. That’s not exactly how I understood it from Eliezer’s paper….   What he said is:

our coherent extrapolated volition is our wish if we knew more, thought faster, were more the people we wished we were, had grown up farther together; where the extrapolation converges rather than diverges, where our wishes cohere rather than interfere; extrapolated as we wish that extrapolated, interpreted as we wish that interpreted.

To me this seems a lot different than just “extrapolating what people want” …

Michael:

The thing is that if all these qualifications were not here, extrapolation would lead to suboptimal outcomes. For instance, you must have made decisions for your children that were more in alignment with what they would want if they were smarter. If you made judgments in alignment with their actual preferences (like wanting to eat candy all day — I don’t know your kids but I know a lot of kids would do this), they would suffer for it in the longer term.

If extrapolations were made taking into account only our current knowledge, and not our knowledge if we knew more, really bad things could happen.

If extrapolations were made based on our human-characteristic thinking speeds, rather than the long-term equilibria of thinking that we would reach immediately if we thought faster, bad things could happen.

If extrapolations were made based on the people we are — often petty and under the control of short-term motivations, rather than who we wished we were, bad things could happen.

The same for each element above. I can understand why you might disagree with some of the above bullet points, but it’s hard to imagine how you could disagree with the notion of volition extrapolation in general.  It is a marvel of human intelligence and inference that no sometimes means yes and yes means no. An AI without a subtle extrapolation process will miss this entirely, and make choices for us that are too closely related to our current states, providing lock-in that we would never have chosen if we were superintelligences.

Salamanders extrapolate preferences. Human extrapolate preferences. Superintelligences will extrapolate preferences. Each new level of intelligence demands a higher level of wisdom for extrapolation. A superintelligence that uses human-level extrapolation algorithms to fulfill wishes would be a menace.

Ben:

Hmmmm…..  Well, to me, the idea of “what Bob would want, if Bob were more of the person Bob wishes he was” is a bit confusing, because “the person Bob wishes he was” is a difficult sort of abstraction.  Bob doesn’t usually genuinely know what kind of person he wishes he was.  He may think he wishes he was an enlightened Zen master – and if he became an enlightened Zen master he might be quite contented that way – but yet the fact that he never took action to become that Zen master during his life in spite of many opportunities, still indicates that large parts of him didn’t really want that….  The notion of “the person you want to be” isn’t well-defined at all….

And looking at cases where different peoples’ wishes cohere is pretty dangerous too. For one thing you’d likely be throwing out your valued rationality, as that is certainly not something on which most peoples’ wishes cohere.  Belief in reincarnation is more likely to make it into the CEV of the human race than rationality.

And looking at the desires you would have if you were “more of the person you wish you were” is probably going to exaggerate the incoherence problem, not mitigate it.   Most religious people will wish they were even MORE religious and god-fearing, so I’d be less coherent with their ideals than I am with their actual selves…

Michael:

“More like the person I wish I was” is not a difficult abstraction. I have  many desired modifications to my mental architecture, and I would prefer than an AI take that into account in its judgments. If Bob has dark thoughts at times, Bob wouldn’t want those dark thoughts to be integrated into the preference aggregation algorithm. It seems simple enough. Without this explicit precaution, said dark thoughts that Bob would choose to be excluded from the preference aggregator would be included anyhow.

The list of items in the definition of Coherent Extrapolated Volition is a way of saying to the AI, “take this into account too”. The alternative is to not take them into account. That seems bad, because these items obviously should be taken into account.

Ben:

Hmmm… but I think separating out certain thoughts of Bob’s from the rest of his mind is not a very easy or well-defined task either.  The human mind is not a set of discretely-defined logical propositions; it’s a strangely-tangled web of interdefinitions, right?

You may not remember, but a couple years ago, in reaction to some of my concerns with the details of the CEV idea, some time ago I defined an alternative called Coherent Aggregated Volition (CAV).  The idea was to come up with a CEV-like idea that seemed less problematic.  Basically, CAV is about trying to find a goal that maximizes several criteria together, such as consistency, and matching closely on average to what a lot of people want, and compactness, and supported-ness by evidence.  I guess this fits within your broad notion of “extrapolation” but it seems rather different from CEV the way Eliezer stated it.

Michael:

This is extremely undesirable, because the idea is not to average out our existing preferences, but to create something new that can serve as a foundation for the future. Similarity to existing gobses should not be a criterion. We are not trying to create a buddy but a Transition Guide, a massively powerful entity whose choices will de facto set the stage for our entire future light cone. The tone of this work, especially w/ regards to the language about the averaging of existing preferences, does not take the AI’s role as Transition Guide sufficiently into account.

Ben:

Hmmm …. I just think the Transition Guide should start from where we are, not from where we (or our optimization algorithm) speculate we might be if our ideal were much smarter, etc….

I think we should provide a superhuman AI initially with some basic human values, not with some weird wacky far-out extrapolation that bears no noticeable resemblance to current human values….  Sure, a little extrapolation is needed, but only a little….

Still, I guess I can agree with you that “some idea in the vague vicinity of what you guys call “CEV” is probably valuable.  I could write a detailed analysis of why I think the details of Eli’s CEV paper are non-workable, but that would take a long day’s work, and I don’t want to put a day into it right now.  Going back and forth any further on particular points via email probably isn’t productive….

Perhaps one way to express the difference is that:

- CAV wants to get at the core of real, current human values, as manifested in real human life

- CEV wants to get at the core of “what humans would like their values to be”, as manifested in what we would like our life to be if we were all better people who were smarter and knew more

Does that feel right to you as a rough summary of the distinction?

Michael:

Yes, the difference between CEV and CAV that you list makes sense.

Ben:

I suppose what I’m getting at is – I think there is a depth and richness and substance to our current, real human values; whereas I think that “what we would like our values to be” is more of a web of fictions, not to be relied upon….

That is – to put it more precisely — suppose one used some sort of combination of CEV and CAV, i.e. something that took into account both current human values, and appropriately extrapolated human values.  And suppose this combination was confidence-weighted, i.e. it paid more attention to those values that were known with more confidence.  Then, my suspicion is that when the combination was done, one would find that the CAV components dominated the CEV components, because of the huge uncertainties in the latter…  But it seems you have a very different intuition on this…

But anyway … I can see this is a deep point that goes beyond the scope of an interview like this on!  Actually this is turning into more of a debate than an interview, which is good fun as well.  But, I think I’d better move on to my next question!

So here goes…  Another proposal that’s been suggested, to mitigate the potential existential risk of human-level or superhuman AGIs, is to create a community of AGIs and have them interact with each other, comprising a society with its own policing mechanisms and social norms and so forth.  The different AGIs would then keep each other in line.  A “social safety net” so to speak.  Steve Omohundro, for example, has been a big advocate of this approach.

What are your thoughts on this sort of direction?

Michael:

Creating a community of AIs as just a way of avoiding the challenge of making an AI you trust.

Create an AI you trust, then worry about the rest. An AI that understands us. Someone we can call our friend, our ally. An agent really on our side. Then, the rest will follow. The key is not to see AI as an alien but as a potential friend. Necessarily regard AI as our enemy, and we will fail.

The universe is not fundamentally Darwinian. If the nice guy has all the weapons, all the control, then the thief and the criminal are screwed. We can defeat death. That’s an affront to Darwinian evolution if there ever was one.  We don’t need to balance AIs off against each other. We need a proxy to a process that represents what we want.

An AI is not like a human individual. A single AI could actually be legion. A single AI might split its awareness into thousands of threads as necessary. Watson does this all the time when it searches through many thousands of documents in parallel.

We don’t need to choose exactly what we want right away. We can just set up a system that leaves the option open in the future. Something that doesn’t lock us into any particular local maxima in the fitness space.

Eliezer nailed this question in 2001.  He really had his thumb right on it.  From the FAQ section of Creating Friendly AI:

Aren’t individual differences necessary to intelligence?  Isn’t a society necessary to produce ideas?  Isn’t capitalism necessary for efficiency?

Individual differences and the free exchange of ideas are necessary to human intelligence because it’s easy for a human to get stuck on one idea and then rationalize away all opposition. One scientist has one idea, but then gets stuck on it and becomes an obstacle to the next generation of scientists. A Friendly seed AI doesn’t rationalize.  Rationalization of mistaken ideas is a complex functional adaptation that evolves in imperfectly deceptive social organisms. Likewise, there are limits to how much experience any one human can accumulate, and we can’t share experiences with each other. There’s a limit to what one human can handle, and so far it hasn’t been possible to build bigger humans.

As for the efficiency of a capitalist economy, in which the efforts of self-interested individuals sum to a (sort of) harmonious whole:  Human economies are constrained to be individualist because humans are individualist. Local selfishness is not the miracle that enables the marvel of a globally efficient economy; rather, all human economies are constrained to be locally selfish in order to work at all. Try to build an economy in defiance of human nature, and it won’t work. This constraint is not necessarily something that carries over to minds in general.

Humans have to cooperate because we’re idiots when we think alone due to egocentric biases. The same does not necessarily apply to AGI. You can make an AGI that avoids egocentric biases from the get-go. People have trouble understanding this because they are anthropomorphic and find it impossible to imagine such a being.  They can doubt, but the empirical evidence will flood us from early experiments in infrahuman AI.  You can call me on this in 2020.

Ben:

Hmmm….  I understand your and Eliezer’s view, but then some other deep futurist thinkers such as Steve Omohundro feel differently.  As I understand it, Steve feels that a trustable community might be easier to create than a trustable “singleton” mind.  And I don’t really think he is making any kind of simplistic anthropomorphic error.   Rather (I think) he thinks that cooperation between minds is a particular sort of self-organizing dynamic that implicitly gives rise to certain emergent structures (like morality for instance) via its self-organizing activity….

But maybe this is just a disagreement about AGI architecture — i.e. you could say he wants to architect an AGI as a community of relatively distinct subcomponents, whereas you want to architect it with a more unified internal architecture??

Michael:

Possibly! Maybe the long-term outcome will be determined by which is easier to build, and my preferences don’t matter because one is just inherently more practical. Most successful AIs, like Watson and Google, seem to have unified architectures. The data-gathering infrastructure may be distributed but the decision-making, while probabilistic, is more or less unified.

Ben:

One final question, then.  What do you think society could be doing now to better mitgate against existential risks … from AGI or from other sources?  More specific answers will be more fully appreciated ;)

Michael:

Create a human-friendly superintelligence. The arguments for why this is a good idea have been laid out numerous times, and is the focus of Nick Bostrom’s essay “Ethical Issues in Advanced Artificial Intelligence”. An increasing majority of transhumanists are adopting this view.

Ben:

Hmmm…. do you have any evidence in favor of the latter sentence?  My informal impression is otherwise, though I don’t have any evidence about the matter….  I wonder if your impression is biased due to your own role in the Singularity Institute, a community that certainly does take that view.  I totally agree with your answer, of course – but I sure meet a lot of futurists who don’t.

Michael:

I mostly communicate with transhumanists that are not already Singularitarians because I am an employee that interfaces with the outside-of-SIAI community. I also have gotten hundreds of emails from media coverage over the past two years. If anything my view is biased towards transhumanists in the Bay Area, of which there are many, but not necessarily transhumanists with direct contact with those who already advocate friendly superintelligence.

Ben:

Very interesting indeed.  Well I hope you’re right that this trend exists, it’s good to see the futurist community adopting a more and more realistic view on these issues, which as you point out are as important as anything on the planet these days.