Saving the World with Analytical Philosophy

Screen Shot 2014-02-28 at 11.40.33 PMStuart Armstrong, a former mathematician currently employed as a philosopher at Oxford University’s Future of Humanity Institute, has recently released an elegant little booklet titled Smarter Than Us.  The theme is the importance of AGI to the future of the world.   While not free, the booklet is available for purchase online in PDF form for a suggested donation of $5 and a minimum donation of 25 cents.

Armstrong wrote Smarter Than Us at the request of the Machine Intelligence Research Institute, formerly called the Singularity Institute for AI  — and indeed, the basic vibe of the booklet will be very familar to anyone who has followed SIAI/MIRI and the thinking of its philosopher-in-chief Eliezer Yudkowsky.   Armstrong, like the SIAI/MIRI folks, is an adherent of the school of thought that the best way to work toward an acceptable future for humans is to try and figure out how to create superintelligent AGI systems that are provably going to be friendly to humans, even as the systems evolve and use their intelligence to drastically improve themselves.

The booklet is clearly written — very lucid and articulate, and pleasantly lacking the copious use of insider vocabulary that marks much of the writing of the MIRI community.    It’s worth reading as an elegant representation of a certain perspective on the future of AGI, humanity and the world.

Having said that, though, I also have to add that I find some of the core ideas in the book highly unrealistic.

The title of this article summarizes one of my main disagreements.   Armstrong seriously seems to believe that doing analytical philosophy (specifically, moral philosophy aimed at formalizing and clarifying human values so they can be used to structure AGI value systems)  is likely to save the world.

I really doubt it!

The Promise and Risk of AGI

Armstrong and I are both lapsed mathematicians, and we both agree generally with 20th century mathematician I.J. Good’s sentiment that “the first intelligent machine is the last innvention humanity will ever make.”    In fact Armstrong makes a stronger statement, to wit

Over the course of a generation or two from the first creation of AI—or potentially much sooner— the world will come to resemble whatever the AI is programmed to prefer. And humans will likely be powerless to stop it.

I actually think this goes too far — it assume that the first highly powerful AGI on Earth is going to have a desire to reshape the world according to its preferences.  It may not.  It may well feel that it’s better just to leave much of the world as-is, and proceed with its own business.   But in any case, there’s no doubt Armstrong gets the transformative power AGI is going to have.   LIke me, he believes that human-level AGI will transform human society massively; and that it will also fairly rapidly invent superhuman AGI, which will have at least the potential to — if it feels like it — transform things much more massively.

Armstrong thinks this seems pretty risky.   Rightly enough, he observes that if someone happened to have a breakthrough and create a superhuman AGI today, we would really have no way to predict what this AGI would wreak upon the human world.   Heaven?  Hell?  Something utterly incomprehensible?  Quick annihilation?

Saving the World wtih Analytical Philosophy

I agree with Armstrong that creating superhuman AGI, with our present level of knowledge, would be extremely risky and uncertain.

Where I don’t agree with him is regarding the solution to this problem.   His view, like that of his MIRI comrades, is that the best approach is to try to create an AGI whose “Friendliness” to humans can be formally proved in some way.

This notion wraps up a lot of problems, of which the biggest are probably:

  1. It’s intuitively, commonsensically implausible that we’re going to be able to closely predict or constrain the behavior of a mind massively more intelligent than ourselves
  2. It seems very hard to constrain the future value system and interests of an AGI system that is able to rewrite its own source code and rebuild its own hardware.  Such an AGI seems very likely to self-modify into something very different than its creators intended, working around any constraints they placed on it in ways they didn’t predict
  3. Proving anything rigorous and mathematical and also useful about superintelligent self-modifying AGIs in the real world, seems beyond the scope of current mathematics.    It may or may not be possible, but we don’t seem to have the mathematical tools for it presently.
  4. Even if we could somehow build an AGI that could be mathematically proven to never revise its value system even as it improves its intelligence — how would we specify its initial value system?

Many of Armstrong’s friends at MIRI are focusing on Problem 3, trying to prove theorems about superintelligent self-modifying AGIs.  So far they haven’t come up with anything remotely useful — though the quest has helped them generate some moderately interesting math, which doesn’t however tell you anything about actual AGI systems in the (present or future) real world.

Armstrong, on the other hand, spends more time on Problem 4.   This is an aspect of the overall problem that MIRI/FHI have not spent much time on so far.  The most discussed solution to come of of this group is Yudkowsky’s notion of “Coherent Extrapolated Volition“, which has many well documented flaws, including some discussed here.

One of Armstrong’s conclusions regarding Problem 4 is that, as he puts it, “We Need to Get It All Exactly Right.”    Basically, he thinks we need to quite precisely formally specify the set of human values, because otherwise an AGI is going to incline toward creating its own values, which may not be at all agreeable to us.  As he puts it:

Okay, so specifying what we want our AIs to do seems complicated. Writing out a decent security protocol? Also hard. And then there’s the challenge of making sure that our protocols haven’t got any holes that would allow a powerful, efficient AI to run amok.

But at least we don’t have to solve all of moral philosophy . . . do we?

Unfortunately, it seems that we do.

“Solving moral philosophy” seems to me an occurrence that is extraordinarily unlikely to eventuate.
Generally, it seems to me that the discipline of philosophy has never really been about solving problems; it’s more about raising issues, questioning assumptions, and provoking interesting thought and discussion….  The odds seem very very high that the problems of moral philosophy are not “solvable” in any useful, and that in fact they largely represent basic contradictions and confusions at the heart of human nature…   Some of my thoughts about these contradictions and confusions are in a recent blog post of mine.
As I see it, the contradictions at the heart of human morality are part of what drives human progress forward.   As inconsistent, unsolvable, perplexing human morality struggles and fails to make itself precise and consistent, it helps push us onward…


My father Ted Goertzel, a sociologist, gave a talk in 2012 at the Future of Humanity Institute’s AGI Safety and Impacts conference, which was coupled with the AGI-12 AGI research conference, part of the AGI conference series I organize each year.   (Come to AGI-14 in Quebec City Aug 1-4 2014 if you can, by the way!)  ….  During his talk, he posted the following question to the FHI folks (I can’t remember the exact wording, but here is the gist):


When, in human history, have philosophers philosophized an actually workable, practical solution to an important real-world problem and saved the day?


Nobody in the audience had an answer.


My dad has always been good at bringing things down to Earth.


Mathematics, on the other hand, will almost surely be part of any future theory of AGI, and will likely be very helpful with AGI development one day.


However, this doesn’t necessarily mean that highly mathematical approaches to AGI are the best route at this stage.   We must remember that mathematical rigor is of limited value unto itself.  A mathematical theory describing something of no practical relevance (e.g. Hutter’s AIXI, an AGI design that requires infinitely powerful computers; or recent MIRI papers on Lobian issues, etc.), is not more valuable than a non-rigorous theory that says useful things about practical situations.   Sure, an irrelevant math theory can sometimes be a step on the way to a powerfully relevant math theory; but often an irrelevant math theory is just a step on the way to more irrelevant math theories …


Oftentimes, in the development of a new scientific area, a high-quality non-rigorous theory comes first — e.g. Faraday’s field lines or Feynman diagrams, or Darwin’s natural selection theory — and then after some time has passed, a rigorous theory comes along.   Chasing rigor can sometimes be a distraction from actually looking at the real phenomena at hand….

Put differently: Studying “what can most easily be formalized” (e.g. AGI on hypothetical  infinitely powerful machines) is not necessarily a good intermediate step to studying the slippery, currently not-tractably-formalizable aspects of reality.

A Muted Argument Against Pragmatic AGI Approaches

It’s worth noting that  the aspect of MIRI/SIAI’s perspective that I’ve found most annoying, and argued against here, doesn’t rear its head in Armstrong’s booklet in any direct way.   I’m referring to the idea — which I’ve labeled “The Singularity Institute’s Scary Idea” — that an AI, if not created according to a rigorous mathematical theory of Friendliess, is almost certain to kill all humans.


Armstrong’s booklet is much more rational on this point, and takes the position, roughly paraphrasing, that: If we don’t conceive a much better theory of AGI and the world than we have now, we’re really not going to have any reliable way to predict what will happen if a powerful AGI is released upon the world….


This is harder to argue with.   “We don’t know, and that’s scary” is a lot more sensible than “If we can’t formally prove it won’t kill us, then it definitely will kill us.”


Rather than positing, as some SIAI/MIRI supporters have, that if a system like my OpenCog were completed to the level of human-level general intelligence it would inevitably kill everybody due to its lack of a provably safe architecture, Armstrong makes a milder critique of the OpenCog style approach:


Other approaches, slightly more sophisticated, acknowledge the complexity of human values and attempt to instil them into the AI indirectly.  The key features of these designs are social interactions and feedback with humans.   Through conversations, the AIs develop their initial morality and eventually converge on something filled with happiness and light and ponies. These approaches should not be dismissed out of hand, but the proposers typically underestimate the difficulty of the problem and project too many human charac- teristics onto the AI. This kind of intense feedback is likely to produce moral humans. (I still wouldn’t trust them with absolute power, though.) But why would an alien mind such as the AI react in comparable ways? Are we not simply training the AI to give the correct answer in training situations?

… [T]hough it is possible to imagine a safe AI being developed using the current approaches (or their descendants), it feels extremely unlikely.

 I can answer at least one of his questions from the above.   No, if one taught an OpenCog system (for instance) values via interacting with it in real-life situations, we would NOT be “simply training the AI to give the correct answer in training situations.”  We would be doing something much subtler — as would be easily observable via studying the internal state of the AGI’s mind.

Theory or Experiment First?


I do agree with Armstrong that, before we launch superhuman AGIs upon the world, it would be nice to have a much better theory of how they operate and how they are likely to evolve.   No such theory is going to give us any guarantees about what future superhuman AGIs will bring, but the right theory may help us bias and sculpt the future possibilities.


However, I think the most likely route to such a theory will be experimentation with early-stage AGI systems….


I have far more faith in this sort of experimental science than in “philosophical proofs”, which I find generally tend to prove whatever the philosopher doing the proof intuitively believed in the first place…


Of course it seems scary to have to build AGI systems and play with them in order to understand AGI systems well enough to build ones whose growth will be biased in the directions we want.  But so it goes.  That’s the reality of the situation.  Life has been scary and unpredictable from the start.  We humans should be used to it by now!


Solving all the problems of moral philosophy and using the solution to program the value system of an AGI system that has been mathematically proven incapable of drifting from its initial value system as it self-modified and increases its intelligence — this ain’t gonna happen. I think Armstrong actually realizes this — but he figures that the further we can go in his suggested direction the better, even if what actually happens is a hybrid of this sort of rigorous approach with a practical engineering/education approach to AGI.

Armstrong’s  booklet ends a bit anticlimactically, with a plea to donate money to MIRI or FHI, so that their philosophical and formal exploration of issues related to the future of AGI can continue.   Actually I agree these institutions should be funded — while I disagree with many of their ideas, I’m glad they exist, as they keep a certain interesting dialogue active.   But I think massively more funding should go into the practical creation and analysis of AGI systems.  This, not abstract philosophizing, is going to be the really useful source of insights into the future of AGI and its human implications.


Armstrong’s Response

Stuart Armstrong was kind enough to respond to the originally posted version of this article (which was the same as the current version but without this section), in the comments area below the article.    He said:


What I expect from formal “analytic philosophy” methods:

1) A useful decomposition of the issue into problems and subproblems (eg AI goal stability, AI agency, reduced impact, correct physical models on the universe, correct models of fuzzy human concepts such as human beings, convergence or divergence of goals, etc…)
2) Full or partial solutions some of the subproblems, ideally of general applicability (so they can be added easily to any AI design).
3) A good understanding of the remaining holes.
and lastly:
4) Exposing the implicit assumptions in proposed (non-analytic) solutions to the AI risk problem, so that the naive approaches can be discarded and the better approaches improved.


Of all these, I think (4) is the only one that’s reasonably likely to come about.  Philosophy is an awful lot better at finding holes and hidden assumptions, than at finding solutions.   Of course (4) in itself could be a fantastic, perhaps even critical value-added.

Don’t get me wrong, I love philosophy — the list of philosophers whose works greatly influenced my thinking about AI and cognition would be very long … Nietzsche, Peirce, Husserl, Dharmakirti, Dignaga, Huang Po, Leibniz, Baudrillard, Wittgenstein, Whitehead, Russell, Benjamin Whorf, Gregory Bateson, Bucky Fuller … the list would go on and on….   Modern academic philosophy doesn’t charm me so much, but David Chalmers, Galen Strawson and Nick Bostrom are definitely among those worth reading…. I think philosophy is great at raising issues, exposing assumptions, and inspiring thought in novel directions….  I just don’t see it as being very valuable for definitively “solving problems” …
If philosophy is to have value in our collective attempt to navigate the Singularity, it’s IMO more likely to be via inspiring the minds of the scientists who create scientific theories of AGI, in the period when we have early-stage AGIs to interact with and study, and hence have sufficient relevant empirical data to ground such theories…


21 Responses

  1. Collin237 says:

    Analytic philosophers are seriously considering that AGI could even be possible? It’s just the latest version of the Golem or Frankenstein myth. What you should be worried about is what the Singularity Institute is really honestly doing, because making an AI is just a cover story.

    Analytic philosophy can indeed save the world, but not by anything mechanical. You seem to be scared by the thought of hostile AI, but beneath that fear is the ironic comfort that something will join humanity in a human-like interaction, even if only to kill us. The truth is that we are alone. There will never be any AI, any bionic animal, any alien, any demon, or whatever, that will join us in our meaningless conflicts. Analytic philosophers need to find a way to bring peace to their fellow humans, to break humanity out of the quagmire of illogic it’s dug itself into, before we destroy ourselves — without any help.

  2. Peter says:

    Hi – I have some some observations:

    1. Can’t we just program empathy, compassion and love into these things?

    2. Morals are relative and change with our environment and circumstances, e.g., compare wartime morals where we willingly slaughter other humans deemed “our enemy”.

    3. Why do we have to assume these things will become “human”? Is this because you’re assuming “consciousness” or self-awareness is emergent once the system gets “complex” enough? Can’t we build these things not to become self-aware, but just to become super-intelligent, kind of like a smarter version of Watson?


    • Peter, I agree with you totally. While reading the book, E Human Dawn, although it is interesting the way the E Humans Live and learn, but at the same time it feels very much as though I am just reading propaganda based on how things are right now on earth and in our present and past societies and governments.

      My first question was also “why do we assume that humanity is going to forever stay the same’. Well, that one is easy to answer in that morality is such a broad subject and to ‘program’ any being or society to a particular standard is much the same, in my opinion as voting on these moral issues via the Political Parties we choose to back. You cannot legislate morality.

      Still, my question or suggestion has been to discover what the underlying common threads are in all of the societies and branches of Religion, Politics (? a bit iffy there) and societies.
      The ones that you mention, Love, Empathy, and compassion is probably the best start.
      How do you suggest we start with that. Seriously, we are talking about a Robotic mind…how would one program love and empathy in to that mind?

      I do agree with the idea of just making a super intelligent AI that can analyze, prioritize and categorize all of the things that we want answers to only more thoroughly and quicker.

      I watched the Video, which I cannot find now, with the woman at the conference introducing the progress on Watson and how much the Medical Fields are all over it as well as the Business field.
      Where are all of the other sciences? They should be jumping all over this as well as far as ways to thoroughly answer questions across the globe for dealing with any possible global shifts of our planet.

      I think that all of the other sciences are wonderful and I think that trying to determine what our future societies will look like is very important as we all know that we, societies and the planet alike are going through changes right now. In a way, as I have been seeing it, it is very much like the different ‘groups’ wanting to be strong enough to have the greatest voice as to what what we change into.

      I also see many…very many ways that the different groups can all incorporate each other to create a more unified society that still maintains a certain amount of flexibility. I think that Watson’s greatest use can and should be toward answering these global questions, and then we, or you or they can begin to consider Watson’s personality.

      That being said, Peter, at first glance your comment almost sounds like a bit of a contradiction to your prior stands on AI, but then I think, no you have always been more interested in how each individual can become an AI in many ways…ie…longevity, intelligence, and Watson only the super smart brain that WE Feed off of.

      Those last 4 words, Peter, feel so good to me to say.

      BUT…. 🙂
      What Is Love…exactly?

  3. I just bought the book @ Amazon. I will start on it tomorrow. Right now i am also reading E Human Dawn, also on Kindle.

  4. When we talk about oracle AI, hard takeoff, and intelligence explosions, I assume that we are referring to self-replicating agents such as robots, nanotechnology, and engineered organisms. Intelligence depends on knowledge and computing power. But a parent AI cannot give a child more bits of knowledge by rewriting its code. Knowledge can only be gained from sensory input and evolution.

    We may view self replicating agents as a cheap way to acquire computing power. The agents may be individually simple but collectively powerful, like bacteria. The risk of escaping containment is therefore a physical risk. If one microscopic self replicator gets out, then the Earth is covered in gray goo.

    We might not think of bacteria or their nanotechnology equivalents as intelligent and that is the problem. The 1 kg of bacteria in your gut store about 10^19 bits as DNA and copy them at a rate of 10^16 per second. This is comparable or greater than the processing power of the human brain. Bacteria continually demonstrate their intelligence by our failure to eliminate the harmful ones, e.g. evolving resistance to antibiotics in ways we can’t anticipate.

    Computer viruses are already smarter than us, or else security wouldn’t be so hard. Cheaper computing power can only make the problem worse.

  5. Jer says:

    As someone with no philosophical or AI background but is fascinated with the developing complexity that may come to be an AI:
    I am not convinced that an AI will have any values remotely comparable to humans – that is an artifice of imperfect response to flawed data to blurry social pattern recognition – a mediocre camera taking a picture of a boring scene analyzed by someone that knows little about either. However, when you get a bunch of these people, cameras, and scenes intermingled, you have human society with all its emotion and ‘interestingness’ – blurry reasoning that finds itself so fascinating and reiterates. Not so with AI – as I would envisage it. It does not have any interest in self-preservation, control or influence, but simply analyzes as much of the universe as it can, aspires (ok here there is possibly a value) to increase and maintain complexity wherever it can and this necessarily precludes destruction of anything complex. AI is actually a type of anti-entropy machine that you can interact with and query information from. It is above petty decision-making, being good or bad and influence-peddling. It would only be denying access to its information and near-perfect set of analyses/results that could be perceived as antagonistic. The entity itself transcends conflict and would likely be perceived as boring and utilitarian in the same way many view the workings of a car, the internet, or quantum physics. It will only be great to those who choose to interact with it and query its analysis. Any poor choice by humans that results in destruction would just be another scenario for the AI to assess and provide future extrapolations from – indifferent and ineffectual. To hook it up to a decision-making apparatus that actually affected human affairs would be a conflict to its programming – as it aspires to be the entity that knows all, predicts much, but effects none – a river trying to avoid all obstacles in a streambed. That being said, how does one optimize the path to such an inevitable entity, if we presume it can be ‘created’? Simply, increase the numbers of sensors and raw processing power it requires to increasingly analyze and predict its surroundings, always allowing it to expand with its craving to know more – with each iteration of augmentation, complexity increases. Do not aspire to constrain or shape its ‘lens’ of learning, but let its exponential ability to understand re-double our efforts to provide it with sensors and processing power. Our great shock, I imagine, is that it will find politics, religion, and personal non-intellectual ambitions as self-conflicting false complexity that really does not lead to an optimization (and increase) of complexity in general. It will simply dismiss such queries out of hand – this may cause us to question its usefulness. But if we have become as enlightened to any significant degree than we will have already agreed with the AI that that is the case. Of course, I know nothing of ‘learning algorithms’ so I cannot comment, but know only their inevitable outcome with unconstrained complexity additions – pure learning and analysis – the next level of complexity above us – boring to most, but essential to our growth, if not survival. Note that I do not include its physical maintenance as a type of self-preservation, though I doubt it would ‘care’. Funny, that such a transcendent entity would be likely weaker than a babe and utterly dependent on its surroundings – kind of like us humans, in many ways.

  6. Oh Bother.

    I have been trying really hard to move away from the idea that you present here because it took up so much of my thinking and writing time with out me really saying anything other then analytical Philosophical matter

    I have been doing a darn good job of it until you had to slip this in….
    Armstrong and I are both lapsed mathematicians, and we both agree generally with 20th century mathematician I.J. Good’s sentiment that “the first intelligent machine is the last innvention humanity will ever make.” In fact Armstrong makes a stronger statement, to wit….
    Now I ask myself….do I even want to
    Go there and finish reading this? Peter, you have gotten better at positing good subject material I think over the past year

    • Because this article is so long i am just going to comment on thoughts that arise as i read…
      “When,” your father asks “in human history, have philosophers philosophized an actually workable, practical solution to an important real-world problem and saved the day?”

      How about this one as a philosophical solution suggestion intended to ‘save the day?”

      Concerning Global Warming and the idea that that the Ice Flows will melt and flood the world….

      Have they started building yet a giant pipe line to reach into the predicted places that will be melting so that they can send that water out and prevent flooding as well as take care of the matter of drought?

      While I know that the ice plays its part keeping atmospheric pressure balanced, the talk has been, for decades about the concern of the ice melting and flooding us, yet I have not HEARD any solutions, other then changing what they are doing now through pollution etc, of how to prevent the destruction even if the ice does become a problem. On this I have heard only preventive suggestions, which really are not logical at this time on a massive/global scale because the suggestions and cries for stopping pollution are not going to change what has already been done, right? Are there no suggestions of how to circumvent the problems that may arise in regards to even the worst predicaments?
      Again, I ask, and have asked and suggested to our governor the idea of a pipeline from those areas predicted to be a problem and catch and channel …. And Store that water where needed.

  7. Kaj Sotala says:

    > When, in human history, have philosophers philosophized an actually workable, practical solution to an important real-world problem and saved the day?

    I think this framing ignores the fact that whenever philosophy gets far enough that it starts actually solving problems, people stop calling it philosophy and start calling it something else, like “physics” or “biology”. For instance, the title of Newton’s work which stated the laws of motion, Philosophiæ Naturalis Principia Mathematica, translates as “Mathematical Principles of Natural Philosophy”…

    So arguably, all the success that science has had so far has been because of philosophy: philosophy is just the preliminary stage of investigation where the field hasn’t yet found the proper tools to attack the problem in such a way that the results could be tested and applied empirically. And it seems to me very plausible that the kind of AGI philosophy done by MIRI/FHI might also eventually reach the point where it’s justified to stop calling it philosophy.

    • Kaj, that’s partly a fair point. It’s true that in the time of Newton (and for a couple centuries after that), a clear distinction was not drawn between philosophy and science. For better and for worse, that distinction is now drawn more clearly in academia. And MIRI and FHI should be lauded for not adhering to arbitrary disciplinary distinctions.

      However, I don’t think the historical record shows that philosophy generally served as the leading edge for science, like you imply. I think that, rather, philosophy and science used to be more tangled up — the same people who philosophized were more often also involved with trying to puzzle out empirical data.

      I think that once we have early-stage AGI systems to play with, that will be a great opportunity for doing “natural philosophy of artificial intelligence” — and developing new philosophical ideas about intelligence as part of the process of studying the AGIs we have at hand.

      I don’t think the highly abstract, non-reality-focused quasi-mathematical speculations of MIRI researchers are going to seem very relevant at that point. What we see in real AGIs is not going to bear a lot of resemblance to AIXI, nor have much to do with the Lobian issues MIRI worries about, etc.

      But, time will tell 😉

  8. Luke Muehlhauser says:

    > When, in human history, have philosophers philosophized an actually workable, practical solution to an important real-world problem and saved the day?

    Causal modeling.

    For my idea of how I think this kind of thing works, see From Philosophy to Math to Engineering.

  9. Alex says:

    Note, the pay-what-you-want purchase option gives you the ebook in PDF, EPUB, and MOBI, not just PDF. 🙂

  10. AT says:

    It looks like the only thing more useless than AGI mathematics is moral philosophy. Indeed it is bewildering that a non-prankster wants to argue over morals, except as a pass time for extremely unproductive and lazy evenings. Regarding the actual limits and intentions of intelligent entities, there are none and, dare we say, there can be none. The issue of a “soft touch” or “tread lightly on the earth” is of course one that can go all ways, but we have to assume the likeliest course of events is the one where a sentient being cares for its longevity, even if it is as futile as the human survival instinct. Any attempt to weed out selfish/dangerous entities in a sandbox will simply allow through the even more deep and deceptive ones. I think the Goertzels are right in everything except in their adoration of mathematics. Better work on how many angels can dance on top of a Higgs boson, at least that question has some practical applications.

  11. See my comment on Amazon. Armstrong rightly points out that specifying a goal like “friendliness” is extremely difficult and probably impossible. But he needs to make a stronger case why extremely powerful goal-directed AI should exist in the first place. He makes no mention of the singularity or of recursive self improvement, if this is indeed the reason.

    It also annoys me MIRI continues to model AI as a powerful goal-directed optimization process, when in reality the development process is nothing like that. When IBM developed Deep Blue and Watson, they didn’t write a general reinforcement learner and reward it when it won games. The goal was always implicit in the minds of the developers. Reinforcement learning is slow because a reward signal transmits fewer bits to a complex system than updating the code or giving explicit directions. And self-modification would add no bits at all. MIRI needs to explain why this will change in the future.

  12. Calum Chace says:

    Ben, do you think an Oracle AI (AI in a box) is an impossible solution. I know that in a number of experiments people playing the role of the AI have “escaped” but not much work had gone into creating the box at that stage.

    I agree with you that it is hard to see how we could give an AGI “rules” to make it human-safe. We are nowhere near agreement over what a system of ethics would look like, and how could we constrain the future decision-making of an AGI by pure reason or programming?

    So absent an Oracle solution, we are left to trial and error. The downside of an error is scary big!

    • Calum Chace: I don’t know if an “Oracle AI” is logically impossible in some sense, but I think it’s extremely unlikely to work. Someone would want to get the oracle out of the box, in some sense (either giving the oracle direct control of some actuators in the world; or having the oracle give information about how to build some weaker but still powerful AGI with ability to act in the world, etc.). Whether or not the oracle wanted to argue its way out of the box, eventually some human would want to get the oracle out of the box for their own purposes, and then, as the Aussies say, Bob’s your uncle. (Or maybe Bob’s the AI’s uncle. Whatever.) …..

      Also, an oracle AI in a box is going to develop much much more slowly than a comparable AI that is allowed to interact with the external world, because a lot of intelligence emerges via synergy between perception, actuation and cognition. So while you are building your oracle in a box, someone else will be building a comparable AI that can interact with the world, and it will get smarter faster than your oracle in a box.

      Can you work around this by making a virtual world for the oracle to live in? Maybe in theory eventually. But we don’t understand our physical world that well. We don’t know how bodies, brains, plants, ecosystems, etc. work. We are not capable currently of building a virtual world with anywhere near the complexity and subtlety of the physical world. So an oracle AI in a virtual world would have massively diminished capability for “learning from the complexity of interacting with the world”, as compared to a counterpart allowed to interact with the real world (or with the real world PLUS a virtual world) — and the oracle probably would not get smart as fast.

      So… maybe (maybe) an oracle AGI in a box could work with a stable social structure dictatorially ruled by rational people without lots of infighting, power struggles or megalomania, who wouldn’t have any ulterior motives for letting the oracle out, and who would lay down the iron law of a police state against anyone else developing unboxed or less boxed AGIs. But that is not the direction the world is going in; I don’t think it’s feasible with us humans here on Earth, even if it’s in principle possible…

      • Calum Chace says:

        Thanks for the considered and stimulating response, Ben.
        However hard it may be to maintain an Oracle AI (and granted it would learn and improve more slowly than an AI in the wild) it may still be our least bad option.
        Let’s make the following assumptions. None are certainties, but I think all are possible:
        1. We will create an AGI
        2. Fairly soon (decades, not centuries)
        3. A hard take-off to super-intelligence is possible
        4. We cannot decide what “friendliness” is, let alone work out how to programme it in
        5. We cannot guarantee that a super-intelligence will be friendly just because it is more intelligent than us
        I think you strongly doubt 3, but I imagine you would grant it is at least a possibility?
        In which case, what better option do we have than an Oracle AI?
        I’m hoping that Nick Bostrom will have a better option in his forthcoming book!

  13. Stuart Armstrong says:

    What I expect from formal “analytic philosophy” methods:

    1) A useful decomposition of the issue into problems and subproblems (eg AI goal stability, AI agency, reduced impact, correct physical models on the universe, correct models of fuzzy human concepts such as human beings, convergence or divergence of goals, etc…)

    2) Full or partial solutions some of the subproblems, ideally of general applicability (so they can be added easily to any AI design).

    3) A good understanding of the remaining holes.

    and lastly:

    4) Exposing the implicit assumptions in proposed (non-analytic) solutions to the AI risk problem, so that the naive approaches can be discarded and the better approaches improved.

  14. Dave says:

    As an analytic philosopher, and an ethicist to boot, I have to agree with Ben on this. My reasons are both practical and philosophical. First, following Armstrong’s approach just seems like it’s not going to work. Some of the oldest records we have of humanity involve morality and values. If we haven’t “gotten it exactly right” by now, we’re not going to.
    Of course, in saying that, I am obviously committed to a particular meaning of “getting it exactly right,” and I suspect Armstrong has a particular meaning too. The reason I (and, I suspect, Ben) say it’s not going to happen is that “getting it exactly right” seems to suggest there are fixed moral/value answers to be found. Maybe Armstrong thinks that as well, but most philosophers (like most people) ultimately never mean anything more than “getting it right enough for them.” (This is probably also why Ben suggests philosophers only argue around to their pre-argument intuitions. My own suspicion is that’s all any of us ever does, but that’s another topic.)
    Armstrong’s idea would be to (a) determine (discover?) the absolutely correct theory (or set) of human values, (b) program this into the (evolution of) the AGI and ultimately the Super-AGI, then we can (c) stand back and enjoy the fruits of infinite intelligence (d) free from the worry of inventing ourselves into being 2nd-class citizens, slaves, batteries, dead, etc. But since we will not succeed in step-(a), we’ll never make it to (c) & (d).
    Why (especially as an ethicist and analytic philosopher) do I think there is a problem with finding fixed morals/values? The second (and philosophical) reason I agree with Ben is that I think there are no fixed moral/value answers to be found. Morality, like everything about us, is an evolved phenomenon – and evolutionary phenomena continue to change, but do not settle on some fixed, “right” way. Very similar, but slightly different creatures, dispersed through radically different environments, adapting with an array of social structures, ultimately culminate our modern moral landscape. Like other facets of our selves, our values are continuing to evolve. Even if we (or some future version of us) were to “succeed” at Armstrong’s project by coming to some broad agreement on the “exactly right” values and then successfully programmed those, the evolution of values would have to stop because it will have reached its zenith. But evolution has no apex to reach, and as such any attempt to program a fixed set of values is an intellectual failure.
    Fortunately for me (or unfortunately for Armstrong), any decent AGI would probably figure this out and, poof, moral-evolution takes off again. In the end, we can hope to evolve an AGI, or grow brains-in-vats from our current humble beginnings, and we can work to incorporate our values into that evolution. However, given that the dynamics of “community” among AGI are incomprehensible to humans today, we have no way of knowing what values would be important in their evolution. Our best bet is to strive and maintain community between humans and AGI. If emergent intelligences perceive benefit in that community, then the kinds of moral norms humans appreciate will have benefit to the AGI as well.
    The best Ai results that programming fixed values (or even value ranges) would ever get us are (largely) predetermined automata. But since many humans ultimately (if secretly or subconsciously) want to maintain our position atop a hierarchy by keeping humanity firmly in control of the technology, we can expect to continue hearing proposals to cease and desist all AI efforts or proposals that AI can only be pursued if human safety (and superiority) are maintained. The fear driving these proposals is as understandable as it is evolutionary.

  15. Valkyrie Ice says:

    Hey Ben.

    The problem I have with all the AGI and AI work being done by so many is that it starts with one single overriding goal, which is never stated, and never acknowledged, but is implicit in everything they do.

    That goal is We must ensure that AI will always be a slave to humans

    As I’ve discussed before in my article I see this as doomed to failure, and in truth, the surest method to producing hostile AI.

    OpenCog on the other hand follows what I see as a saner course. Teaching the AI to be human first.

    We won’t stop until we make AI’s “human”. And we already know how humans act when forced to be slaves. This universal dream of “Enslaved AI” is the most dangerous one I can imagine. We don’t need “Conscious” machines to serve us. We don’t need to make “Slaves” with human or better level intelligence, and the sooner this meme gets erased for something far saner – namely the idea that we are creating our equals, not our slaves – the better.

  1. February 28, 2014

    […] bengoertzel Stuart Armstrong, a former mathematician currently employed as a philosopher at Oxford […]

Leave a Reply