Saving the World with Analytical Philosophy
Stuart Armstrong, a former mathematician currently employed as a philosopher at Oxford University’s Future of Humanity Institute, has recently released an elegant little booklet titled Smarter Than Us. The theme is the importance of AGI to the future of the world. While not free, the booklet is available for purchase online in PDF form for a suggested donation of $5 and a minimum donation of 25 cents.
Armstrong wrote Smarter Than Us at the request of the Machine Intelligence Research Institute, formerly called the Singularity Institute for AI — and indeed, the basic vibe of the booklet will be very familar to anyone who has followed SIAI/MIRI and the thinking of its philosopher-in-chief Eliezer Yudkowsky. Armstrong, like the SIAI/MIRI folks, is an adherent of the school of thought that the best way to work toward an acceptable future for humans is to try and figure out how to create superintelligent AGI systems that are provably going to be friendly to humans, even as the systems evolve and use their intelligence to drastically improve themselves.
The booklet is clearly written — very lucid and articulate, and pleasantly lacking the copious use of insider vocabulary that marks much of the writing of the MIRI community. It’s worth reading as an elegant representation of a certain perspective on the future of AGI, humanity and the world.
Having said that, though, I also have to add that I find some of the core ideas in the book highly unrealistic.
The title of this article summarizes one of my main disagreements. Armstrong seriously seems to believe that doing analytical philosophy (specifically, moral philosophy aimed at formalizing and clarifying human values so they can be used to structure AGI value systems) is likely to save the world.
I really doubt it!
The Promise and Risk of AGI
Armstrong and I are both lapsed mathematicians, and we both agree generally with 20th century mathematician I.J. Good’s sentiment that “the first intelligent machine is the last innvention humanity will ever make.” In fact Armstrong makes a stronger statement, to wit
Over the course of a generation or two from the first creation of AI—or potentially much sooner— the world will come to resemble whatever the AI is programmed to prefer. And humans will likely be powerless to stop it.
I actually think this goes too far — it assume that the first highly powerful AGI on Earth is going to have a desire to reshape the world according to its preferences. It may not. It may well feel that it’s better just to leave much of the world as-is, and proceed with its own business. But in any case, there’s no doubt Armstrong gets the transformative power AGI is going to have. LIke me, he believes that human-level AGI will transform human society massively; and that it will also fairly rapidly invent superhuman AGI, which will have at least the potential to — if it feels like it — transform things much more massively.
Armstrong thinks this seems pretty risky. Rightly enough, he observes that if someone happened to have a breakthrough and create a superhuman AGI today, we would really have no way to predict what this AGI would wreak upon the human world. Heaven? Hell? Something utterly incomprehensible? Quick annihilation?
Saving the World wtih Analytical Philosophy
I agree with Armstrong that creating superhuman AGI, with our present level of knowledge, would be extremely risky and uncertain.
Where I don’t agree with him is regarding the solution to this problem. His view, like that of his MIRI comrades, is that the best approach is to try to create an AGI whose “Friendliness” to humans can be formally proved in some way.
This notion wraps up a lot of problems, of which the biggest are probably:
- It’s intuitively, commonsensically implausible that we’re going to be able to closely predict or constrain the behavior of a mind massively more intelligent than ourselves
- It seems very hard to constrain the future value system and interests of an AGI system that is able to rewrite its own source code and rebuild its own hardware. Such an AGI seems very likely to self-modify into something very different than its creators intended, working around any constraints they placed on it in ways they didn’t predict
- Proving anything rigorous and mathematical and also useful about superintelligent self-modifying AGIs in the real world, seems beyond the scope of current mathematics. It may or may not be possible, but we don’t seem to have the mathematical tools for it presently.
- Even if we could somehow build an AGI that could be mathematically proven to never revise its value system even as it improves its intelligence — how would we specify its initial value system?
Many of Armstrong’s friends at MIRI are focusing on Problem 3, trying to prove theorems about superintelligent self-modifying AGIs. So far they haven’t come up with anything remotely useful — though the quest has helped them generate some moderately interesting math, which doesn’t however tell you anything about actual AGI systems in the (present or future) real world.
Armstrong, on the other hand, spends more time on Problem 4. This is an aspect of the overall problem that MIRI/FHI have not spent much time on so far. The most discussed solution to come of of this group is Yudkowsky’s notion of “Coherent Extrapolated Volition“, which has many well documented flaws, including some discussed here.
One of Armstrong’s conclusions regarding Problem 4 is that, as he puts it, “We Need to Get It All Exactly Right.” Basically, he thinks we need to quite precisely formally specify the set of human values, because otherwise an AGI is going to incline toward creating its own values, which may not be at all agreeable to us. As he puts it:
￼￼￼Okay, so specifying what we want our AIs to do seems complicated. Writing out a decent security protocol? Also hard. And then there’s the challenge of making sure that our protocols haven’t got any holes that would allow a powerful, efficient AI to run amok.
But at least we don’t have to solve all of moral philosophy . . . do we?
Unfortunately, it seems that we do.
My father Ted Goertzel, a sociologist, gave a talk in 2012 at the Future of Humanity Institute’s AGI Safety and Impacts conference, which was coupled with the AGI-12 AGI research conference, part of the AGI conference series I organize each year. (Come to AGI-14 in Quebec City Aug 1-4 2014 if you can, by the way!) …. During his talk, he posted the following question to the FHI folks (I can’t remember the exact wording, but here is the gist):
When, in human history, have philosophers philosophized an actually workable, practical solution to an important real-world problem and saved the day?
Nobody in the audience had an answer.
My dad has always been good at bringing things down to Earth.
Mathematics, on the other hand, will almost surely be part of any future theory of AGI, and will likely be very helpful with AGI development one day.
However, this doesn’t necessarily mean that highly mathematical approaches to AGI are the best route at this stage. We must remember that mathematical rigor is of limited value unto itself. A mathematical theory describing something of no practical relevance (e.g. Hutter’s AIXI, an AGI design that requires infinitely powerful computers; or recent MIRI papers on Lobian issues, etc.), is not more valuable than a non-rigorous theory that says useful things about practical situations. Sure, an irrelevant math theory can sometimes be a step on the way to a powerfully relevant math theory; but often an irrelevant math theory is just a step on the way to more irrelevant math theories …
Oftentimes, in the development of a new scientific area, a high-quality non-rigorous theory comes first — e.g. Faraday’s field lines or Feynman diagrams, or Darwin’s natural selection theory — and then after some time has passed, a rigorous theory comes along. Chasing rigor can sometimes be a distraction from actually looking at the real phenomena at hand….
A Muted Argument Against Pragmatic AGI Approaches
… [T]hough it is possible to imagine a safe AI being developed using the current approaches (or their descendants), it feels extremely unlikely.
Theory or Experiment First?
Armstrong’s booklet ends a bit anticlimactically, with a plea to donate money to MIRI or FHI, so that their philosophical and formal exploration of issues related to the future of AGI can continue. Actually I agree these institutions should be funded — while I disagree with many of their ideas, I’m glad they exist, as they keep a certain interesting dialogue active. But I think massively more funding should go into the practical creation and analysis of AGI systems. This, not abstract philosophizing, is going to be the really useful source of insights into the future of AGI and its human implications.
Stuart Armstrong was kind enough to respond to the originally posted version of this article (which was the same as the current version but without this section), in the comments area below the article. He said:
What I expect from formal “analytic philosophy” methods:
Of all these, I think (4) is the only one that’s reasonably likely to come about. Philosophy is an awful lot better at finding holes and hidden assumptions, than at finding solutions. Of course (4) in itself could be a fantastic, perhaps even critical value-added.