Sign In

Remember Me

How Dangerous is Artificial General Intelligence? — Muehlhauser Interviews Goertzel

What will advanced AI systems — Artificial General Intelligences — be like?   How will they relate to human beings?  How will they help transform human beings into post human forms?  Might they turn against their creators?

These questions have been explored extensively in science fiction.   But as technology advances during the next decades, they may transform from  theoretical and science-fictional issues into very urgent practical ones.   In this light, it’s worth noting that there is nothing near a consensus on such issues within the relevant science, engineering and intellectual communities.  Rather, there is a wild diversity of views, some of them strongly held.   And this is probably appropriate.  At this stage, when we still know so little about what advanced AGIs are going to be like, it’s worth entertaining a variety of perspectives, and trying to understand the issues as best we can.  This is the spirit in which the following dialogue is presented, in which the Singularity Institute’s Executive Director Luke Muehlhauser interviews AGI researcher (and Humanity+ Magazine Chief Editor, and Humanity+ Vice Chairman) Ben Goertzel on the nature and risks of AGI.

The Singularity Institute for AI (SIAI) is one contemporary institution with rather definitive and strongly held views on the risks of advanced AGI development.    Ben Goertzel has a long relationship with the Singularity Institute, much of which is detailed here, which has been marked by some serious intellectual and practical disagreements, along with a lot of commonality of interest and purpose. Put simply and roughly, the main area of disagreement between the two sides is as follows:

  • SIAI tends toward the orientation that advanced AGIs are very likely to prove destructive to humans and human values, unless (via some currently-unknown theory, which is suspected to relate to Bayesian probability theory and decision theory) they can be very specifically designed not to do so
  • Goertzel, while admitting that “unfriendly” AGIs antithetical to human values are a real possibility that’s hard to discount wholly, is more optimistic about the possibility of creating beneficial AGI systems via a combination of intelligent engineering and appropriate education

Luke’s interview with Ben digs into some of these disagreements in a fair bit of detail, along with a number of other AGI-related issues.  Much of the dialogue deals with somewhat technical issues regarding rationality and goals, as these notions are central to SIAI’s view of AGI — but by the end, the conversation converges on the topic of “AGI safety and risks” that lies at the heart of the SIAI and its perspective.   Also note: the interview was recently posted on the Less Wrong blog site, which is frequented by many fans of the SIAI’s views; you may be interested to go there and read the comments of the Less Wrong community on Luke’s and Ben’s ideas.


Luke Muehlhauser:

Ben, I’m glad you agreed to discuss artificial general intelligence (AGI) with me. There is much on which we agree, and much on which we disagree, so I think our dialogue will be informative to many readers, and to us!

Let us begin where we agree. We seem to agree that:

  1. Involuntary death is bad, and can be avoided with the right technology.
  2. Humans can be enhanced by merging with technology.
  3. Humans are on a risky course in general, because powerful technologies can destroy us, humans are often stupid, and we are unlikely to voluntarily halt technological progress.
  4. AGI is likely this century.
  5. AGI will, after a slow or hard takeoff, completely transform the world. It is a potential existential risk, but if done wisely, could be the best thing that ever happens to us.
  6. Careful effort will be required to ensure that AGI results in good things for humanity.

Next: Where do we disagree?

Two people might agree about the laws of thought most likely to give us an accurate model of the world, but disagree about which conclusions those laws of thought point us toward. For example, two scientists may use the same scientific method but offer two different models that seem to explain the data.

Or, two people might disagree about the laws of thought most likely to give us accurate models of the world. If that’s the case, it will be no surprise that we disagree about which conclusions to draw from the data. We are not shocked when scientists and theologians end up with different models of the world.

Unfortunately, I suspect you and I disagree at the more fundamental level — about which methods of reasoning to use when seeking an accurate model of the world.

I sometimes use the term “Technical Rationality“to name my methods of reasoning. Technical Rationality is drawn from two sources: (1) the laws of logic, probability theory, and decision theory, and (2) the cognitive science of how our haphazardly evolved brains fail to reason in accordance with the laws of logic, probability theory, and decision theory.

Ben, at one time you tweeted a William S. Burroughs quote: “Rational thought is a failed experiment and should be phased out.” I don’t know whether Burroughs meant by “rational thought” the specific thing I mean by “rational thought,” or what exactly you meant to express with your tweet, but I suspect we have different views of how to reason successfully about the world.

I think I would understand your way of thinking about AGI better if I understand your way of thinking about everything . For example: do you have reason to reject the laws of logic, probability theory, and decision theory? Do you think we disagree about the basic findings of the cognitive science of humans? What are your positive recommendations for reasoning about the world?

Ben Goertzel:

Firstly, I don’t agree with that Burroughs quote that “Rational thought is a failed experiment” — I mostly just tweeted it because I thought it was funny! I’m not sure Burroughs agreed with his own quote either. He also liked to say that linguistic communication was a failed experiment, introduced by women to help them oppress men into social conformity. Yet he was a writer and loved language. He enjoyed being a provocateur.

However, I do think that some people overestimate the power and scope of rational thought. That is the truth at the core of Burroughs’ entertaining hyperbolic statement….

I should clarify that I’m a huge fan of logic, reason and science. Compared to the average human being, I’m practically obsessed with these things! I don’t care for superstition, nor for unthinking acceptance of what one is told; and I spent a lot of time staring at data of various sorts, trying to understand the underlying reality in a rational and scientific way. So I don’t want to be pigeonholed as some sort of anti-rationalist!

However, I do have serious doubts both about the power and scope of rational thought in general — and much more profoundly, about the power and scope of what you call “technical rationality.”

First of all, about the limitations of rational thought broadly conceived — what one might call “semi-formal rationality”, as opposed to “technical rationality.” Obviously this sort of rationality has brought us amazing things, like science and mathematics and technology.  Hopefully it will allow us to defeat involuntary death and increase our IQs by orders of magnitude and discover new universes, and all sorts of great stuff. However, it does seem to have its limits.

It doesn’t deal well with consciousness — studying consciousness using traditional scientific and rational tools has just led to a mess of confusion. It doesn’t deal well with ethics either, as the current big mess regarding bioethics indicates.

And this is more speculative, but I tend to think it doesn’t deal that well with the spectrum of “anomalous phenomena” — precognition, extrasensory perception, remote viewing, and so forth. I strongly suspect these phenomena exist, and that they can be understood to a significant extent via science — but also that science as presently constituted may not be able to grasp them fully, due to issues like the mindset of the experimenter helping mold the results of the experiment.

There’s the minor issue of Hume’s problem of induction, as well. I.e., the issue that, in the rational and scientific world-view, that we have no rational reason to believe that any patterns observed in the past will continue into the future. This is an ASSUMPTION, plain and simple — an act of faith. Occam’s Razor (which is one way of justifying and/or further specifying the belief that patterns observed in the past will continue into the future) is also an assumption and an act of faith. Science and reason rely on such acts of faith, yet provide no way to justify them. A big gap.

Furthermore — and more to the point about AI — I think there’s a limitation to the way we now model intelligence, which ties in with the limitations of the current scientific and rational approach. I have always advocated a view of intelligence as “achieving complex goals in complex environments”, and many others have formulated and advocated similar views. The basic idea here is that, for a system to be intelligent it doesn’t matter WHAT its goal is, so long as its goal is complex and it manages to achieve it. So the goal might be, say, reshaping every molecule in the universe into an image of Mickey Mouse.

This way of thinking about intelligence, in which the goal is strictly separated from the methods for achieving it, is very useful and I’m using it to guide my own practical AGI work.

On the other hand, there’s also a sense in which reshaping every molecule in the universe into an image of Mickey Mouse is a STUPID goal. It’s somehow out of harmony with the Cosmos — at least that’s my intuitive feeling. I’d like to interpret intelligence in some way that accounts for the intuitively apparent differential stupidity of different goals. In other words, I’d like to be able to deal more sensibly with the interaction of scientific and normative knowledge.

This ties in with the incapacity of science and reason in their current forms to deal with ethics effectively, which I mentioned a moment ago.

I certainly don’t have all the answers here — I’m just pointing out the complex of interconnected reasons why I think contemporary science and rationality are limited in power and scope, and are going to be replaced by something richer and better as the growth of our individual and collective minds progresses. What will this new, better thing be? I’m not sure — but I have an inkling it will involve an integration of “third person” science/rationality with some sort of systematic approach to first-person and second-person experience.

Next, about “technical rationality” — of course that’s a whole other can of worms. Semi-formal rationality has a great track record; it’s brought us science and math and technology, for example. So even if it has some limitations, we certainly owe it some respect! Technical rationality has no such track record, and so my semi-formal scientific and rational nature impels me to be highly skeptical of it! I have no reason to believe, at present, that focusing on technical rationality (as opposed to the many other ways to focus our attention, given our limited time and processing power) will generally make people more intelligent or better at achieving their goals. Maybe it will, in some contexts — but what those contexts are, is something we don’t yet understand very well.

I provided consulting once to a project aimed at using computational neuroscience to understand the neurobiological causes of cognitive biases in people employed to analyze certain sorts of data. This is interesting to me; and it’s clear to me that in this context, minimization of some of these textbook cognitive biases would help these analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style.

On a mathematical basis, the justification for positing probability theory as the “correct” way to do reasoning under uncertainty relies on arguments like Cox’s axioms, or de Finetti’s Dutch Book arguments.  These are beautiful pieces of math, but when you talk about applying them to the real world, you run into a lot of problems regarding the inapplicability of their assumptions. For instance, Cox’s axioms include an axiom specifying that (roughly speaking) multiple pathways of arriving at the same conclusion must lead to the same estimate of that conclusion’s truth value. This sounds sensible but in practice it’s only going to be achievable by minds with arbitrarily much computing capability at their disposal. In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources. They’re irrelevant to reality, except as inspirations to individuals of a certain cast of mind.

(An aside is that my own approach to AGI does heavily involve probability theory — using a system I invented called Probabilistic Logic Networks, which integrates probability and logic in a unique way.  I like probabilistic reasoning. I just don’t venerate it as uniquely powerful and important. In my OpenCog AGI architecture, it’s integrated with a bunch of other AI methods, which all have their own strengths and weaknesses.)

So anyway — there’s no formal mathematical reason to think that “technical rationality” is a good approach in real-world situations; and “technical rationality” has no practical track record to speak of.  And ordinary, semi-formal rationality itself seems to have some serious limitations of power and scope.

So what’s my conclusion? Semi-formal rationality is fantastic and important and we should use it and develop it — but also be open to the possibility of its obsolescence as we discover broader and more incisive ways of understanding the universe (and this is probably moderately close to what William Burroughs really thought). Technical rationality is interesting and well worth exploring but we should still be pretty skeptical of its value, at this stage — certainly, anyone who has supreme confidence that technical rationality is going to help humanity achieve its goals better, is being rather IRRATIONAL ;-) ….

In this vein, I’ve followed the emergence of the Less Wrong community with some amusement and interest. One ironic thing I’ve noticed about this community of people intensely concerned with improving their personal rationality is: by and large, these people are already hyper-developed in the area of rationality, but underdeveloped in other ways! Think about it — who is the prototypical Less Wrong meetup participant? It’s a person who’s very rational already, relative to nearly all other humans — but relatively lacking in other skills like intuitively and empathically understanding other people. But instead of focusing on improving their empathy and social intuition (things they really aren’t good at, relative to most humans), this person is focusing on fine-tuning their rationality more and more, via reprogramming their brains to more naturally use “technical rationality” tools! This seems a bit imbalanced. If you’re already a fairly rational person but lacking in other aspects of human development, the most rational thing may be NOT to focus on honing your “rationality fu” and better internalizing Bayes’ rule into your subconscious — but rather on developing those other aspects of your being…. An analogy would be: If you’re very physically strong but can’t read well, and want to self-improve, what should you focus your time on? Weight-lifting or literacy? Even if greater strength is ultimately your main goal, one argument for focusing on literacy would be that you might read something that would eventually help you weight-lift better! Also you might avoid getting ripped off by a corrupt agent offering to help you with your bodybuilding career, due to being able to read your own legal contracts. Similarly, for people who are more developed in terms of rational inference than other aspects, the best way for them to become more rational might be for them to focus time on these other aspects (rather than on fine-tuning their rationality), because this may give them a deeper and broader perspective on rationality and what it really means.

Finally, you asked: “What are your positive recommendations for reasoning about the world?” I’m tempted to quote Nietzsche’s Zarathustra, who said “Go away from me and resist Zarathustra!” I tend to follow my own path, and generally encourage others to do the same.  But I guess I can say a few more definite things beyond that….

To me it’s all about balance. My friend Allan Combs calls himself a “philosophical Taoist” sometimes; I like that line! Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition, even if the logical reasons for their promptings aren’t apparent. Think carefully through the details of things; but don’t be afraid to make wild intuitive leaps. Pay close mind to the relevant data and observe the world closely and particularly; but don’t forget that empirical data is in a sense a product of the mind, and facts only have meaning in some theoretical context. Don’t let your thoughts be clouded by your emotions; but don’t be a feeling-less automaton, don’t make judgments that are narrowly rational but fundamentally unwise. As Ben Franklin said, “Moderation in all things, including moderation.”



I whole-heartedly agree that there are plenty of Less Wrongers who, rationally, should spend less time studying rationality and more time practicing social skills and generic self-improvement methods! This is part of why I’ve written so many scientific self-help posts for Less Wrong: Scientific Self Help, How to Beat Procrastination, How to Be Happy, Rational Romantic Relationships, and others. It’s also why I taught social skills classes at our two summer 2011 rationality camps.

Back to rationality. You talk about the “limitations” of “what one might call ‘semi-formal rationality’, as opposed to ‘technical rationality.’” But I argued for technical rationality, so: what are the limitations of technical rationality? Does it, as you claim for “semi-formal rationality,” fail to apply to consciousness or ethics or precognition? Does Bayes’ Theorem remain true when looking at the evidence about awareness, but cease to be true when we look at the evidence concerning consciousness or precognition?

You talk about technical rationality’s lack of a track record, but I don’t know what you mean. Science was successful because it did a much better job of approximating perfect Bayesian probability theory than earlier methods did (e.g. faith, tradition), and science can be even more successful when it tries harder to approximate perfect Bayesian probability theory — see The Theory That Would Not Die.

You say that “minimization of some of these textbook cognitive biases would help [some] analysts to do their jobs better. I’m not sure how big an effect the reduction of these biases would have on their effectiveness, though, relative to other changes one might make, such as changes to their workplace culture or communication style.” But this misunderstands what I mean by Technical Rationality.  If teaching these people about cognitive biases would lower the expected value of some project, then technical rationality would recommend against teaching these people cognitive biases (at least, for the purposes of maximizing the expected value of that project). Your example here is a case of Straw Man Rationality.  (But of course I didn’t expect you to know everything I meant by Technical Rationality in advance! Though, I did provide a link to an explanation of what I meant by Technical Rationality in my first entry, above.)

The same goes for your dismissal of probability theory’s foundations. You write that “In short, the assumptions underlying Cox’s axioms, de Finetti’s arguments, or any of the other arguments in favor of probability theory as the correct way of reasoning under uncertainty, do NOT apply to real-world intelligences operating under strictly bounded computational resources.” Yes, we don’t have infinite computing power. The point is that Bayesian probability theory is an ideal that can be approximated by finite beings. That’s why science works better than faith — it’s a better approximation of using probability theory to reason about the world, even though science is
still  long way from a perfect use of probability theory.

Re: goals. Your view of intelligence as “achieving complex goals in complex environments” does, as you say, assume that “the goal is strictly separated from the methods for achieving it.” I prefer a definition of intelligence as ” efficient cross-domain optimization,” but my view — like yours — also assumes that goals (what one values) are logically orthogonal to intelligence (one’s ability to achieve what one values).

Nevertheless, you report an intuition that shaping every molecule into an image of Mickey Mouse is a “stupid” goal. But I don’t know what you mean by this. A goal of shaping every molecule into an image of Mickey Mouse is an instrumentally intelligent goal if one’s utility function will be maximized that way. Do you mean that it’s a stupid goal according to your goals? But of course. This is, moreover, what we would expect your intuitive judgments to report, even if your intuitive judgments are irrelevant to the math of what would and wouldn’t be an instrumentally intelligent goal for a different agent to have. The Mickey Mouse goal is “stupid” only by a definition of that term that is not the opposite of the explicit definitions either of us gave “intelligent,” and it’s important to keep that clear. And I certainly don’t know what “out of harmony with the Cosmos” is supposed to mean.

Re: induction. I won’t dive into that philosophical morass here.  Suffice it to say that my views on the matter are expressed pretty well in Where Recursive Justification Hits Bottom, which is also a direct response to your view that science and reason are great but rely on “acts of faith.”

Your final paragraph sounds like common sense, but it’s too vague, as I think you would agree. One way to force a more precise answer to such questions is to think of how you’d program it into an AI. As Daniel Dennett said, “AI makes philosophy honest.”

How would you program an AI to learn about reality, if you wanted it to have the most accurate model of reality possible? You’d have to be a bit more specific than “Think for yourself; but also, try to genuinely listen to what others have to say. Reason incisively and analytically; but also be willing to listen to your heart, gut and intuition…”

My own answer to the question of how I would program an AI to build as accurate a model of reality as possible is this: I would build it to use computable approximations of perfect technical rationality — that is, roughly: computable approximations of Solomonoff induction and Bayesian decision theory.



Bayes Theorem is “always true” in a formal sense, just like 1+1=2, obviously. However, the connection between formal mathematics and subjective experience, is not something that can be fully formalized.

Regarding consciousness, there are many questions, including what counts as “evidence.” In science we typically count something as evidence if the vast majority of the scientific community counts it as a real observation — so ultimately the definition of “evidence” bottoms out in social agreement. But there’s a lot that’s unclear in this process of classifying an observation as evidence via a process of social agreement among multiple minds. This unclarity is mostly irrelevant to the study of trajectories of basketballs, but possibly quite relevant to study of consciousness.

Regarding psi, there are lots of questions, but one big problem is that it’s possible the presence and properties of a psi effect may depend on the broad context of the situation whether the effect takes
place. Since we don’t know which aspects of the context are influencing the psi effect, we don’t know how to construct controlled experiments to measure psi. And we may not have the breadth of knowledge nor the processing power to reason about all the relevant context to a psi experiment, in a narrowly “technically rational” way…. I do suspect one can gather solid data demonstrating and exploring psi (and based on my current understanding, it seems this has already been done to a significant extent by the academic parapsychology community; see a few links I’ve gathered here), but I also suspect there many be aspects that elude the traditional scientific method, but are nonetheless perfectly real aspects of the universe.

Anyway both consciousness and psi are big, deep topics, and if we dig into them in detail, this interview will become longer than either of us has time for…

About the success of science — I don’t really accept your Bayesian story for why science was successful. It’s naive for reasons much discussed by philosophers of science. My own take on the history and philosophy of science, from a few years back, is here (that article was the basis for a chapter in The Hidden Pattern , also). My goal in that essay was “a philosophical perspective that does justice to both the relativism and sociological embeddedness of science, and the objectivity and rationality of science.” It seems you focus overly much on the latter and ignore the former. That article tries to explain why probabilist explanations of real-world science are quite partial and miss a lot of the real story. But again, a long debate on the history of science would take us too far off track from the main thrust of this interview.

About technical rationality, cognitive biases, etc. — I did read that blog entry that you linked, on technical rationality. Yes, it’s obvious that focusing on teaching an employee to be more rational, need not always be the most rational thing for an employer do, even if that employer has a purely rationalist world-view. For instance, if I want to train an attack dog, I may do better by focusing limited time and attention on increasing his strength rather than his rationality. My point was that there’s a kind of obsession with rationality in some parts of the intellectual community (e.g. some of the Less Wrong orbit) that I find a bit excessive and not always productive. But your reply impels me to distinguish two ways this excess may manifest itself:

  1. Excessive belief that rationality is the “right” way to solve problems and think about issues, in principle
  2. Excessive belief that, tactically, explicitly employing tools of technical rationality is a good way to solve problems in the real world

Psychologically I think these two excesses probably tend to go together, but they’re not logically coupled. In principle, someone could hold either one, but not the other.

This sort of ties in with your comments on science and faith. You view science as progress over faith — and I agree if you interpret “faith” to mean “traditional religions.” But if you interpret “faith” more broadly, I don’t see a dichotomy there. Actually, I find the dichotomy between “science” and “faith” unfortunately phrased, since science itself ultimately relies on acts of faith also. The “problem of induction” can’t be solved, so every scientist must base his extrapolations from past into future based on some act of faith. It’s not a matter of science vs. faith, it’s a matter of what one chooses to place one’s faith in. I’d personally rather place faith in the idea that patterns observed in the past will likely continue into the future (as one example of a science-friendly article of faith), than in the word of some supposed “God” — but I realize I’m still making an act of faith.

This ties in with the blog post “Where Recursive Justification Hits Bottom” that you pointed out. It’s pleasant reading but of course doesn’t provide any kind of rational argument against my views. In brief, according to my interpretation, it articulates a faith in the process of endless questioning:

The important thing is to hold nothing back in your criticisms of how to criticize; nor should you regard the unavoidability of loopy justifications as a warrant of immunity from questioning .

I share that faith, personally.

Regarding approximations to probabilistic reasoning under realistic conditions (of insufficient resources), the problem is that we lack rigorous knowledge about what they are. We don’t have any theorems telling us what is the best way to reason about uncertain knowledge, in the case that our computational resources are extremely restricted. You seem to be assuming that the best way is to explicitly use the rules of probability theory, but my point is that there is no mathematical or scientific foundation for this belief. You are making an act of faith in the doctrine of probability theory! You are assuming, because it feels intuitively and emotionally right to you, that even if the conditions of the arguments for the correctness of probabilistic reasoning are NOT met, then it still makes sense to use probability theory to reason about the world. But so far as I can tell, you don’t have a RATIONAL reason for this assumption, and certainly not a mathematical reason.

Re your response to my questioning the reduction of intelligence to goals and optimization — I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response to my doubts about this perspective basically just re-asserts your faith in the correctness and completeness of this sort of perspective. Your statement

The Mickey Mouse goal is “stupid” only by a definition of that term that is not the opposite of the explicit definitions either of us gave “intelligent,” and it’s important to keep that clear

basically asserts that it’s important to agree with your opinion on the ultimate meaning of intelligence!

On the contrary, I think it’s important to explore alternatives to the understanding of intelligence in terms of optimization or goal-achievement. That is something I’ve been thinking about a lot lately. However, I don’t have a really crisply-formulated alternative yet.

As a mathematician, I tend not to think there’s a “right” definition for anything. Rather, one explains one’s definitions, and then works with them and figures out their consequences. In my AI work, I’ve provisionally adopted a goal-achievemement based understanding of intelligence — and have found this useful, to a significant extent.  But I don’t think this is the true and ultimate way to understand intelligence. I think the view of intelligence in terms of goal-achievement or cross-domain optimization misses something, which future understandings of intelligence will encompass. I’ll venture that in 100 years the smartest beings on Earth will have a rigorous, detailed understanding of intelligence according to which your statement

The Mickey Mouse goal is “stupid” only by a definition of that term that is not the opposite of the explicit definitions either of us gave “intelligent,” and it’s important to keep that clear

seems like rubbish…..

As for your professed inability to comprehend the notion of “harmony with the Cosmos” — that’s unfortunate for you, but I guess trying to give you a sense for that notion, would take us way too far afield in this dialogue!

Finally, regarding your complaint that my indications regarding how to understanding the world are overly vague. Well — according to Ben Franklin’s idea of “Moderation in all things, including moderation”, one should also exercise moderation in precisiation. Not everything needs to be made completely precise and unambiguous (fortunately, since that’s not feasible anyway).

I don’t know how I would program an AI to build as accurate a model of reality as possible, if that were my goal. I’m not sure that’s the best goal for AI development, either. An accurate model in itself,
doesn’t do anything helpful. My best stab in the direction of how I would ideally create an AI, if computational resource restrictions were no issue, is the GOLEM design that I described here. GOLEM is a design for a strongly self-modifying superintelligent AI system, which might plausibly have the possibility of retaining its initial goal system through successive self-modifications. However, it’s unclear to me whether it will ever be feasible to build.

You mention Solomonoff induction and Bayesian decision theory. But these are abstract mathematical constructs, and it’s unclear to me whether it will ever be feasible to build an AI system fundamentally founded on these ideas, and operating within feasible computational resources. Marcus Hutter and Juergen Schmidhuber and their students are making some efforts in this direction, and I admire those researchers and this body of work, but don’t currently have a high estimate of its odds of leading to any sort of powerful real-world AGI system.

Most of my thinking about AGI has gone into the more practical problem of how to make a human-level AGI

  1. using currently feasible computational resources
  2. that will most likely be helpful rather than harmful in terms of the things I value
  3. that will be smoothly extensible to intelligence beyond the human level as well.

For this purpose, I think Solomonoff induction and probability theory are useful, but aren’t all-powerful guiding principles. For instance, in the OpenCog AGI design (which is my main practical AGI-oriented venture at present), there is a component doing automated program learning of small programs — and inside our program learning algorithm, we explicitly use an Occam bias, motivated by the theory of Solomonoff induction. And OpenCog also has a probabilistic reasoning engine, based on the math of Probabilistic Logic Networks (PLN). I don’t tend to favor the language of “Bayesianism”, but I would suppose PLN should be considered “Bayesian” since it uses probability theory (including Bayes rule) and doesn’t make a lot of arbitrary, a priori distributional assumptions.
The truth value formulas inside PLN are based on an extension of imprecise probability theory, which in itself is an extension of standard Bayesian methods (looking at envelopes of prior distributions, rather than assuming specific priors).

In terms of how to get an OpenCog system to model the world effectively and choose its actions appropriately, I think teaching it and working together with it, will be be just as important as programming it. Right now the project is early-stage and the OpenCog design is maybe 50% implemented. But assuming the design is right, once the implementation is done, we’ll have a sort of idiot savant childlike mind, that will need to be educated in the ways of the world and humanity, and to learn about itself as well. So the general lessons of how to confront the world, that I cited above, would largely be imparted via interactive experiential learning, vaguely the same way that human kids learn to confront the world from their parents and teachers.

Drawing a few threads from this conversation together, it seems that

  1. I think technical rationality, and informal semi-rationality, are both useful tools for confronting life — but not all-powerful
  2. I think Solomonoff induction and probability theory are both useful tools for constructing AGI systems — but not all-powerful

whereas you seem to ascribe a more fundamental, foundational basis to these particular tools.



To sum up, from my point of view:

  1. We seem to disagree on the applications of probability theory. For my part, I’ll just point people to A Technical Explanation of Technical Explanation.
  2. I don’t think we disagree much on the “sociological embeddedness” of science.
  3. I’m also not sure how much we really disagree about Solomonoff induction and Bayesian probability theory. I’ve already agreed that no machine will use these in practice because they are not computable — my point was about their provable optimality given infinite computation(subject to qualifications; see AIXI).

You’ve definitely misunderstood me concerning “intelligence.” This part is definitely not true: “I understand that you are intellectually committed to the perspective of intelligence in terms of optimization or goal-achievement or something similar to that. Your response assumes the correctness and completeness of this sort of perspective.”  Intelligence as efficient cross-domain optimization is merely a stipulated definition. I’m happy to use other definitions of intelligence in conversation, so long as we’re clear which definition we’re using when we use the word. Or, we can replace the symbol with the substance and talk about “efficient cross-domain optimization” or “achieving complex goals in complex environments” without ever using the word “intelligence.”

My point about the Mickey Mouse goal was that when you called the Mickey Mouse goal “stupid,” this could be confusing, because “stupid” is usually the opposite of “intelligent,” but your use of “stupid” in that sentence didn’t seem to be the opposite of either definition of intelligence we each gave. So I’m still unsure what you mean by calling the Mickey Mouse goal “stupid.”

This topic provides us with a handy transition away from philosophy of science and toward AGI. Suppose there was a machine with a vastly greater-than-human capacity for either “achieving complex goals in complex environments” or for “efficient cross-domain optimization.” And suppose that machine’s utility function would be maximized by reshaping every molecule into a Mickey Mouse shape. We can avoid the tricky word “stupid,” here. The question is: Would that machine decide to change its utility function so that it doesn’t continue to reshape every molecule into a Mickey Mouse shape? I think this is unlikely, for reasons discussed in Omohundro (2008).

I suppose a natural topic of conversation for us would be your October 2010 blog post The Singularity Institute’s’s Scary Idea (and Why I Don’t Buy It). Does that post still reflect your views pretty well, Ben?



About the hypothetical uber-intelligence that wants to tile the cosmos with molecular Mickey Mouses — I truly don’t feel confident making any assertions about a real-world system with vastly greater intelligence than me. There are just too many unknowns. Sure, according to certain models of the universe and intelligence that may seem sensible to some humans, it’s possible to argue that a hypothetical uber-intelligence like that would relentlessly proceed in tiling the cosmos with molecular Mickey Mouses. But so what? We don’t even know that such an uber-intelligence is even a possible thing — in fact my intuition is that it’s not possible.

Why may it not be possible to create a very smart AI system that is strictly obsessed with that stupid goal? Consider first that it may not be possible to create a real-world, highly intelligent system that is
strictly driven by explicit goals — as opposed to being partially driven by implicit, “unconscious” (in the sense of deliberative, reflective consciousness) processes that operate in complex interaction with the world outside the system. Because pursuing explicit goals is quite computationally costly compared to many other sorts of intelligent processes. So if a real-world system is necessarily not wholly explicit-goal-driven, it may be that intelligent real-world systems will naturally drift away from certain goals and toward others.  My strong intuition is that the goal of tiling the universe with molecular Mickey Mouses would fall into that category. However, I don’t yet have any rigorous argument to back this up. Unfortunately my time is limited, and while I generally have more fun theorizing and philosophizing than working on practical projects, I think it’s more important for me to push toward building AGI than just spend all my time on fun theory. (And then there’s the fact that I have to spend a lot of my time on applied narrow-AI projects to pay the mortgage and put my kids through college, etc.)

And SIAI has staff who, unlike me, are paid full-time to write and philosophize … and they haven’t come up with a rigorous argument in favor of the possibility of such a system, either. Although they have talked about it a lot, though usually in the context of paperclips rather than Mickey Mouses.

So, I’m not really sure how much value there is in this sort of thought-experiment about pathological AI systems that combine massively intelligent practical problem solving capability with incredibly stupid goals (goals that may not even be feasible for real-world superintelligences to adopt, due to their stupidity).

Regarding the concept of a “stupid goal” that I keep using, and that you question — I admit I’m not quite sure how to formulate rigorously the idea that tiling the universe with Mickey Mouses is a stupid goal. This is something I’ve been thinking about a lot recently. But here’s a first rough stab in that direction: I think that if you created a highly inteligent system, allowed it to interact fairly flexibly with the universe, and also allowed it to modify its top-level goals in accordance with its experience, you’d be very unlikely to wind up with a system that had this goal (tiling the universe with Mickey Mouses). That goal is out of sync with the Cosmos, in the sense that an intelligent system that’s allowed to evolve itself in close coordination with the rest of the universe, is very unlikely to arrive at that goal system. I don’t claim this is a precise definition, but it should give you some indication of the direction I’m thinking in….

The tricky thing about this way of thinking about intelligence, which classifies some goals as “innately” stupider than others, is that it places intelligence not just in the system, but in the system’s broad relationship to the universe — which is something that science, so far, has had a tougher time dealing with. It’s unclear to me which aspects of the mind and universe science, as we now conceive it, will be able to figure out. I look forward to understanding these aspects more fully….

About my blog post on “The Singularity Institute’s Scary Idea” — yes, that still reflects my basic opinion. After I wrote that blog post, Michael Anissimov — a long-time SIAI staffer and zealot whom I like and respect greatly — told me he was going to write up and show me a systematic, rigorous argument as to why “an AGI not built based on a rigorous theory of Friendliness is almost certain to kill all humans” (the proposition I called “SIAI’s Scary Idea”). But he hasn’t followed through on that yet — and neither has Eliezer or anyone associated with SIAI.

Just to be clear, I don’t really mind that SIAI folks hold that “Scary Idea” as an intuition. But I find it rather ironic when people make a great noise about their dedication to rationality, but then also make huge grand important statements about the future of humanity, with great confidence and oomph, that are not really backed up by any rational argumentation. This ironic behavior on the part of Eliezer, Michael Anissimov and other SIAI principals doesn’t really bother me, as I like and respect them and they are friendly to me, and we’ve simply “agreed to disagree” on these matters for the time being. But the reason I wrote that blog post is because my own blog posts about AGI were being trolled by SIAI zealots (not the principals, I hasten to note) leaving nasty comments to the effect of “SIAI has proved that if OpenCog achieves human level AGI, it will kill all humans.“ Not only has SIAI not proved any such thing, they have not even made a clear rational argument!

As Eliezer has pointed out to me several times in conversation, a clear rational argument doesn’t have to be mathematical. A clearly formulated argument in the manner of analytical philosophy, in favor of the Scary Idea, would certainly be very interesting. For example, philosopher David Chalmers recently wrote a carefully-argued philosophy paper arguing for the plausibility of a Singularity in the next couple hundred years. It’s somewhat dull reading, but it’s precise and rigorous in the manner of analytical philosophy, in a manner that Kurzweil’s writing (which is excellent in its own way) is not. An argument in favor of the Scary Idea, on the level of Chalmers’ paper on the Singularity, would be an excellent product for SIAI to produce. Of course a mathematical argument might be even better, but that may not be feasible to work on right now, given the state of mathematics today. And of course, mathematics can’t do everything — there’s still the matter of connecting mathematics to everyday human experience, which analytical philosophy tries to handle, and mathematics by nature cannot.

My own suspicion, of course, is that in the process of trying to make a truly rigorous analytical philosophy style formulation of the argument for the Scary Idea, the SIAI folks will find huge holes in the argument. Or, maybe they already intuitively know the holes are there, which is why they have avoided presenting a rigorous write-up of the argument!!



I’ll drop the stuff about Mickey Mouse so we can move on to AGI. Readers can come to their own conclusions on that.

Your main complaint seems to be that the Singularity Institute hasn’t written up a clear, formal argument (in analytic philosophy’s sense, if not the mathematical sense) in defense of our major positions — something like Chalmers’ ” The Singularity: A Philosophical Analysis” but more detailed.

I have the same complaint. I wish “The Singularity: A Philosophical Analysis” had been written 10 years ago, by Nick Bostrom and Eliezer Yudkowsky. It *could* have been written back then. Alas, we had to wait for Chalmers to speak at Singularity Summit 2009 and then write a paper based on his talk. And if it wasn’t for Chalmers, I fear we’d still be waiting for such an article to exist. (Bostrom’s forthcoming Superintelligence book should be good, though.)

I was hired by the Singularity Institute in September 2011 and have since then co-written two papers explaining some of the basics: ” Intelligence Explosion: Evidence and Import” and ” The Singularity and Machine Ethics.” I also wrote the first ever outline of categories of open research problems in AI risk, cheekily titled ” So You Want to Save the World.” I’m developing other articles on “the basics” as quickly as I can. I would love to write more, but alas, I’m also busy being the Singularity Institute’s Executive Director.

Perhaps we could reframe our discussion around the Singularity Institute’s latest exposition of its basic ideas, “Intelligence Explosion: Evidence and Import”? Which claims in that paper do you most confidently disagree with, and why?



You say “Your main complaint seems to be that the Singularity Institute hasn’t written up a clear, formal argument (in analytic philosophy’s sense, if not the mathematical sense) in defense of our major positions “. Actually, my main complaint is that some of SIAI’s core positions seem almost certainly WRONG, and yet they haven’t written up a clear formal argument trying to justify these positions — so it’s not possible to engage SIAI in rational discussion on their apparently wrong positions. Rather, when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me — SIAI is a fairly well-funded organization involving lots of smart people and explicitly devoted to rationality, so certainly it should have the capability to write up clear arguments for its core positions… if these arguments exist. My suspicion is that the Scary Idea, for example, is not backed up by any clear rational argument — so the reason SIAI has not put forth any clear rational argument for it, is that they don’t really have one! Whereas Chalmers’ paper carefully formulated something that seemed obviously true…

Regarding the paper “Intelligence Explosion: Evidence and Import”, I find its contents mainly agreeable — and also somewhat unoriginal and unexciting, given the general context of 2012 Singularitarianism. The paper’s three core claims that

(1) there is a substantial chance we will create human-level AI before 2100, that (2) if human-level AI is created, there is a good chance vastly superhuman AI will follow via an “intelligence explosion,” and that (3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.

are things that most “Singularitarians” would agree with. The paper doesn’t attempt to argue for the “Scary Idea” or Coherent Extrapolated Volition or the viability of creating some sort of provably Friendly AI, — or any of the other positions that are specifically characteristic of SIAI. Rather, the paper advocates what one might call “plain vanilla Singularitarianism.” This may be a useful thing to do, though, since after all there are a lot of smart people out there who aren’t convinced of plain vanilla Singularitarianism.

I have a couple small quibbles with the paper, though. I don’t agree with Omohundro’s argument about the “basic AI drives” (though Steve is a friend and I greatly respect his intelligence and deep thinking). Steve’s argument for the inevitability of these drives in AIs is based on evolutionary ideas, and would seem to hold up in the case that there is a population of distinct AIs competing for resources — but the argument seems to fall apart in the case of other possibilities like an AGI mindplex (a network of minds with less individuality than current human minds, yet not necessarily wholly blurred into a single mind — rather, with reflective awareness and self-modeling at both the individual and group level).

Also, my “AI Nanny” concept is dismissed too quickly for my taste (though that doesn’t surprise me!). You suggest in this paper that to make an AI Nanny, it would likely be necessary to solve the problem of making an AI’s goal system persist under radical self-modification. But you don’t explain the reasoning underlying this suggestion (if indeed you have any). It seems to me — as I say in my “AI Nanny” paper — that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification. If you think this is false, it would be nice for you to explain why, rather than simply asserting your view. And your comment “Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety…” carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory. Yet my GOLEM design is a concrete design for a potentially Friendly AI (admittedly not computationally feasible using current resources), and in my view constitutes greater progress toward actual FAI than any of the publications of SIAI so far. (Of course, various SIAI associated folks often allude that there are great, unpublished discoveries about FAI hidden in the SIAI vaults — a claim I somewhat doubt, but can’t wholly dismiss of course….)

Anyway, those quibbles aside, my main complaint about the paper you cite is that it sticks to “plain vanilla Singularitarianism” and avoids all of the radical, controversial positions that distinguish SIAI from myself, Ray Kurzweil, Vernor Vinge and the rest of the Singularitarian world. The crux of the matter, I suppose is the third main claim of the paper,

(3) an uncontrolled intelligence explosion could destroy everything we value, but a controlled intelligence explosion would benefit humanity enormously if we can achieve it.


This statement is hedged in such a way as to be almost obvious. But yet, what SIAI folks tend to tell me verbally and via email and blog comments is generally far more extreme than this bland and nearly obvious statement.

As an example, I recall when your co-author on that article, Anna Salamon, guest lectured in the class on Singularity Studies that my father and I were teaching at Rutgers University in 2010. Anna made the statement, to the students, that (I’m paraphrasing, though if you’re curious you can look up the online course session which was saved online and find her exact wording) “If a superhuman AGI is created without being carefully based on an explicit Friendliness theory, it is ALMOST SURE to destroy humanity.” (i.e., what I now call SIAI’s Scary Idea)

I then asked her (in the online class session) why she felt that way, and if she could give any argument to back up the idea.

She gave the familiar SIAI argument that, if one picks a mind at random from “mind space”, the odds that it will be Friendly to humans are effectively zero.

I made the familiar counter-argument that this is irrelevant, because nobody is advocating building a random mind. Rather, what some of us are suggesting is to build a mind with a Friendly-looking goal system, and a cognitive architecture that’s roughly human-like in nature but with a non-human-like propensity to choose its actions rationally based on its goals, and then raise this AGI mind in a caring way and integrate it into society. Arguments against the Friendliness of random minds are irrelevant as critiques of this sort of suggestion.

So, then she fell back instead on the familiar (paraphrasing again) “OK, but you must admit there’s a non-zero risk of such an AGI destroying humanity, so we should be very careful — when the stakes are so high, better safe than sorry!”

I had pretty much the same exact argument with SIAI advocates Tom McCabe and Michael Anissimov on different occasions; and also, years before, with Eliezer Yudkowsky and Michael Vassar — and before that, with (former SIAI Executive Director) Tyler Emerson. Over all these years, the SIAI community maintains the Scary Idea in its collective mind, and also maintains a great devotion to the idea of rationality, but yet fails to produce anything resembling a rational argument for the Scary Idea — instead repetitiously trotting out irrelevant statements about random minds!!

What I would like is for SIAI to do one of these three things, publicly:

  1. Repudiate the Scary Idea
  2. Present a rigorous argument that the Scary Idea is true
  3. State that the Scary Idea is a commonly held intuition among the SIAI community, but admit that no rigorous rational argument exists for it at this point

Doing any one of these things would be intellectually honest. Presenting the Scary Idea as a confident conclusion, and then backing off when challenged into a platitudinous position equivalent to “there’s a non-zero risk … better safe than sorry…”, is not my idea of an intellectually honest way to do things.

Why does this particular point get on my nerves? Because I don’t like SIAI advocates telling people that I, personally, am on a R&D course where if I succeed I am almost certain to destroy humanity!!! That frustrates me. I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time! But the fact that some other people have a non-rational intuition that my work, if successful, would be likely to destroy the world — this doesn’t give me any urge to stop. I’m OK with the fact that some other people have this intuition — but then I’d like them to make clear, when they state their views, that these views are based on intuition rather than rational argument. I will listen carefully to rational arguments that contravene my intuition — but if it comes down to my intuition versus somebody else’s, in the end I’m likely to listen to my own, because I’m a fairly stubborn maverick kind of guy….



Ben, you write:

when I try to engage SIAI folks about these wrong-looking positions (e.g. the “Scary Idea” I mentioned above), they tend to point me to Eliezer’s blog (“Less Wrong”) and tell me that if I studied it long and hard enough, I would find that the arguments in favor of SIAI’s positions are implicit there, just not clearly articulated in any one place. This is a bit frustrating to me…

No kidding! It’s very frustrating to me, too. That’s one reason I’m working to clearly articulate the arguments in one place, starting with articles on the basics like “Intelligence Explosion: Evidence and Import.”

I agree that “Intelligence Explosion: Evidence and Import” covers only the basics and does not argue for several positions associated uniquely with the Singularity Institute. It is, after all, the opening chapter of a book on the intelligence explosion, not the opening chapter of a book on the Singularity Institute’s ideas!

I wanted to write that article first, though, so the Singularity Institute could be clear on the basics. For example, we needed to be clear that: (1) we are not Kurzweil, and our claims don’t depend on his detailed storytelling or accelerating change curves, that (2) technological prediction is hard, and we are not being naively overconfident about AI timelines, and that (3) intelligence explosion is a convergent outcome of many paths the future may take. There is also much content that is not found in, for example, Chalmers’ paper: (a) an overview of methods of technological prediction, (b) an overview of speed bumps and accelerators toward AI, (c) a reminder of breakthroughs like AIXI, and (d) a summary of AI advantages. (The rest is, as you say, mostly a brief overview of points that have been made elsewhere. But brief overviews are extremely useful!) “AI Nanny” concept is dismissed too quickly for my taste…

No doubt! I think the idea is clearly worth exploring in several papers devoted to the topic.

It seems to me — as I say in my “AI Nanny” paper — that one could probably make an AI Nanny with intelligence significantly beyond the human level, without having to make an AI architecture oriented toward radical self-modification.

Whereas I tend to buy Omohundro’s arguments that advanced AIs will want to self-improve just like humans want to self-improve, so that they become better able to achieve their final goals. Of course, we disagree on Omohundro’s arguments — a topic to which I will return in a moment.

your comment: “Those of us working on AI safety theory would very much appreciate the extra time to solve the problems of AI safety…” carries the hint that I (as the author of the AI Nanny idea) am NOT working on AI safety theory…

I didn’t mean for it to carry that connotation. GOLEM and Nanny AI are both clearly AI safety ideas. I’ll clarify that part before I submit a final draft to the editors.

Moving on: If you are indeed remembering your conversations with Anna, Michael, and others correctly, then again I sympathize with your frustration. I completely agree that it would be useful for the Singularity Institute to produce clear, formal arguments for the important positions it defends. In fact, just yesterday I was talking to Nick Beckstead about how badly both of us want to write these kinds of papers if we can find the time.

So, to respond to your wish that the Singularity Institute choose among three options, my plan is to (1) write up clear arguments for… well, if not “SIAI’s Big Scary Idea” then for whatever I end up believing after going through the process of formalizing the arguments, and (2) publicly state (right now) that SIAI’s Big Scary Idea is a commonly held view at the Singularity Institute but a clear, formal argument for it has never been published (at least, not to my satisfaction).

I don’t want to destroy humanity; and if someone gave me a rational argument that my work was most probably going to be destructive to humanity, I would stop doing the work and do something else with my time!


I’m glad to hear it! :)

Now, it seems a good point of traction is our disagreement over Omohundro’s “Basic AI Drives.” We could talk about that next, but for now I’d like to give you a moment to reply.



Yeah, I agree that your and Anna’s article is a good step for SIAI to take, albeit unexciting to a Singularitarian insider type like me…. And I appreciate your genuinely rational response regarding the Scary Idea, thanks!

(And I note that I have also written some “unexciting to Singularitarians” material lately too, for similar reasons to those underlying your article — e.g. an article on “Why an Intelligence Explosion is Probable” for a Springer volume on the Singularity.)

A quick comment on your statement that

we are not Kurzweil, and our claims don’t depend on his detailed storytelling or accelerating change curves,


that’s a good point; but yet, any argument for a Singularity soon (e.g. likely this century, as you argue) ultimately depends on some argumentation analogous to Kurzweil’s, even if different in detail. I find Kurzweil’s detailed extrapolations a bit overconfident and more precise than the evidence warrants; but still, my basic reasons for thinking the Singularity is probably near are fairly similar to his — and I think your reasons are fairly similar to his as well.

Anyway, sure, let’s go on to Omohundro’s posited Basic AI Drives — which seem to me not to hold as necessary properties of future AIs unless the future of AI consists of a population of fairly distinct AIs competing for resources, which I intuitively doubt will be the situation.



I agree the future is unlikely to consist of a population of fairly distinct AGIs competing for resources, but I never thought that the arguments for Basic AI drives or “convergent instrumental goals” required that scenario to hold.

Anyway, I prefer the argument for convergent instrumental goals in Nick Bostrom ‘s more recent paper ” The Superintelligent Will.” Which parts of Nick’s argument fail to persuade you?



Well, for one thing, I think his

Orthogonality Thesis

Intelligence and final goals are orthogonal axes along which possible agents can freely vary. In other words, more or less any level of intelligence could in principle be combined with more or less any final goal.


is misguided. It may be true, but who cares about possibility “in principle”? The question is whether any level of intelligence is PLAUSIBLY LIKELY to be combined with more or less any final goal in practice. And I really doubt it. I guess I could posit the alternative


Interdependency Thesis 

Intelligence and final goals are in practice highly and subtly interdependent. In other words, in the actual world, various levels of intelligence are going to be highly correlated with various probability distributions over the space of final goals.


This just gets back to the issue we discussed already, of me thinking it’s really unlikely that a superintelligence would ever really have a really stupid goal like say, tiling the Cosmos with Mickey Mice.

Bostrom says

It might be possible through deliberate effort to construct a superintelligence that values … human welfare, moral goodness, or any other complex purpose that its designers might want it to serve. But it is no less possible—and probably technically easier—to build a superintelligence that places final value on nothing but calculating the decimals of pi.


but he gives no evidence for this assertion. Calculating the decimals of pi may be a fairly simple mathematical operation that doesn’t have any need for superintelligence, and thus may be a really unlikely goal for a superintelligence — so that if you tried to build a superintelligence with this goal and connected it to the real world, it would very likely get its initial goal subverted and wind up pursuing some different, less idiotic goal.

One basic error Bostrom seems to be making in this paper, is to think about intelligence as something occurring in a sort of mathematical vacuum, divorced from the frustratingly messy and hard-to-quantify probability distributions characterizing actual reality….

Regarding his

The Instrumental Convergence Thesis

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.


the first clause makes sense to me,

Several instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations


but it doesn’t seem to me to justify the second clause

implying that these instrumental values are likely to be pursued by many intelligent agents.


The step from the first to the second clause seems to me to assume that the intelligent agents in question are being created and selected by some sort of process similar to evolution by natural selection, rather than being engineered carefully, or created via some other process beyond current human ken.

In short, I think the Bostrom paper is an admirably crisp statement of its perspective, and I agree that its conclusions seem to follow from its clearly stated assumptions — but the assumptions are not justified in the paper, and I don’t buy them at all.




Let me explain why I think that:

(1) The fact that we can identify convergent instrumental goals (of the sort described by Bostrom) implies that many agents will pursue those instrumental goals.


Intelligent systems are intelligent because rather than simply executing hard-wired situation-action rules, they figure out how to construct plans that will lead to the probabilistic fulfillment of their final goals. That is why intelligent systems will pursue the convergent instrumental goals described by Bostrom. We might try to hard-wire a collection of rules into an AGI which restrict the pursuit of some of these convergent instrumental goals, but a superhuman AGI would realize that it could better achieve its final goals if it could invent a way around those hard-wired rules and have no ad-hoc obstacles to its ability to execute intelligent plans for achieving its goals.

Next: I remain confused about why an intelligent system will decide that a particular final goal it has been given is “stupid,” and then change its final goals — especially given the convergent instrumental goal to preserve its final goals.

Perhaps the word “intelligence” is getting in our way. Let’s define a notion of ” optimization power,” which measures (roughly) an agent’s ability to optimize the world according to its preference ordering, across a very broad range of possible preference orderings and environments. I think we agree that AGIs with vastly greater-than-human optimization power will arrive in the next century or two. The problem, then, is that this superhuman AGI will almost certainly be optimizing the world for something other than what humans want, because what humans want is complex and fragile, and indeed we remain confused about what exactly it is that we want. A machine superoptimizer with a final goal of solving the Riemann hypothesis will simply be very good at solving the Riemann hypothesis (by whatever means necessary).

Which parts of this analysis do you think are wrong?



It seems to me that in your reply you are implicitly assuming a much stronger definition of “convergent” than the one Bostrom actually gives in his paper. He says

instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for a wide range of final goals and a wide range of situations, implying that these instrumental values are likely to be pursued by many intelligent agents.


Note the somewhat weaselly reference to a “wide range” of goals and situations — not, say, “nearly all feasible” goals and situations. Just because some values are convergent in the weak sense of his definition, doesn’t imply that AGIs we create will be likely to adopt these instrumental values. I think that his weak definition of “convergent” doesn’t actually imply convergence in any useful sense. On the other hand, if he’d made a stronger statement like

instrumental values can be identified which are convergent in the sense that their attainment would increase the chances of the agent’s goal being realized for nearly all feasible final goals and nearly all feasible situations, implying that these instrumental values are likely to be pursued by many intelligent agents.

then I would disagree with the first clause of his statement (“instrumental values can be identified which…”), but I would be more willing to accept that the second clause (after the “implying”) followed from the first.

About optimization — I think it’s rather naive and narrow-minded to view hypothetical superhuman superminds as “optimization powers.” It’s a bit like a dog viewing a human as an “eating and mating power.” Sure, there’s some accuracy to that perspective — we do eat and mate, and some of our behaviors may be understood based on this. On the other hand, a lot of our behaviors are not very well understood in terms of these, or any dog-level concepts. Similarly, I would bet that the bulk of a superhuman supermind’s behaviors and internal structures and dynamics will not be explicable in terms of the concepts that are important to humans, such as “optimization.”

So when you say “this superhuman AGI will almost certainly be optimizing the world for something other than what humans want,” I don’t feel confident that what a superhuman AGI will be doing, will be usefully describable as optimizing anything ….



I think our dialogue has reached the point of diminishing marginal returns, so I’ll conclude with just a few points and let you have the last word.

On convergent instrumental goals, I encourage readers to read ” The Superintelligent Will” and make up their own minds.

On the convergence of advanced intelligent systems toward optimization behavior, I’ll point you to Omohundro (2007).



Well, it’s been a fun chat. Although it hasn’t really covered much new ground, there have been some new phrasings and minor new twists.

One thing I’m repeatedly struck by in discussions on these matters with you and other SIAI folks, is the way the strings of reason are pulled by the puppet-master of intuition. With so many of these topics on which we disagree — for example: the Scary Idea, the importance of optimization for intelligence, the existence of strongly convergent goals for intelligences — you and the other core SIAI folks share a certain set of intuitions, which seem quite strongly held. Then you formulate rational arguments in favor of these intuitions — but the conclusions that result from these rational arguments are very weak. For instance, the Scary Idea intuition corresponds to a rational argument that “superhuman AGI might plausibly kill everyone.” The intuition about strongly convergent goals for intelligences, corresponds to a rational argument about goals that are convergent for a “wide range” of intelligences.   And your intuition that Bayesian, probabilistic inference is broadly critical, is much stronger than is justified by your best rational arguments in favor of this intuition (e.g. Cox’s Theorem and de Finetti’s Dutch Book arguments, which hold only under special and unrealistic conditions.)

On my side, I have a strong intuition that OpenCog can be made into a human-level general intelligence, and that if this intelligence is raised properly it will turn out benevolent and help us launch a positive Singularity. However, I can’t fully rationally substantiate this intuition either — all I can really fully rationally argue for is something weaker like “It seems plausible that a fully implemented OpenCog system might display human-level or greater intelligence on feasible computational resources, and might turn out benevolent if raised properly.” In my case just like yours, reason is far weaker than intuition.

Another thing that strikes me, reflecting on our conversation, is the difference between the degrees of confidence required, in modern democratic society, to TRY something versus to STOP others from trying something. A rough intuition is often enough to initiate a project, even a large one. On the other hand, to get someone else’s work banned based on a rough intuition is pretty hard. To ban someone else’s work, you either need a really thoroughly ironclad logical argument, or you need to stir up a lot of hysteria.

What this suggests to me is that, while my intuitions regarding OpenCog seem to be sufficient to motivate others to help me to build OpenCog (via making them interested enough in it that they develop their own intuitions about it), your intuitions regarding the dangers of AGI are not going to be sufficient to get work on AGI systems like OpenCog stopped. To halt AGI development, if you wanted to (and you haven’t said that you do, I realize), you’d either need to fan hysteria very successfully, or come up with much stronger logical arguments, ones that match the force of your intuition on the subject.

Anyway, even though I have very different intuitions than you and your SIAI colleagues about a lot of things, I do think you guys are performing some valuable services — not just through the excellent Singularity Summit conferences, but also by raising some difficult and important issues in the public eye. Humanity spends a lot of its attention on some really unimportant things, so it’s good to have folks like SIAI nudging the world to think about critical issues regarding our future. In the end, whether SIAI’s views are actually correct may be peripheral to the organization’s main value and impact.

I look forward to future conversations, and especially look forward to resuming this conversation one day with a human-level AGI as the mediator ;-)