Date: October 30, 2012
An interesting question about AGI agent design is how one would build an “angelic” autonomous AGI. Would it be possible to make some kind of angel’s mind that, by design, achieves only good? Philosophically speaking, is there any ultimate standard of ethics (since angel is just a mythological fantasy)? In this paper, I would like to define universally benevolent AGI objectives, also discussing what I consider to be malevolent objectives, as well as the limitations and risks of the objectives that I present.
This is also a common question that many seek a somewhat easier answer in the form of “friendly AI” which has been explained in . In that paper, Yudkowsky defines friendly AI very generally as a superintelligent system that realizes a positive outcome, and he argues laborously that abandoning human values will result in futures that are worthless from a human point of view, and thus recommends researchers to seek complex value systems (of humans) for embedding in AI’s. While that is a challenging goal in itself, we think that the alternatives have not been exhaustively researched. One idea that comes to mind is that some of the better aspects of humanity may be generalized and put into a universal form that any intelligent, civilized agent, including extraterrestrials, will agree with. Furthermore, the friendly AI approaches (putting human desires at the forefront) may have some shortcomings in my opinion, the most obvious is that it places too much faith in humanity. They seem also ethically ambiguous or too anthropocentric, with such assumptions that machines would be considered “beneficial” if they served human desires, or that they would be deemed “good” if they followed simple utilitarian formulations which seem to try to reduce ethics to low-level properties of the human nervous system. First, it has not been persuasively explained what their utility should be. If for instance positive utilitarianism were supposed, it would be sufficient to make humans happy. If human society degenerated as a whole, would this mean that all resources would be spent on petty pursuits? If a coherent extrapolated volition  were realized with an AGI agent, would this set our sights on exploring other star systems, or spending our resources on such unessential trivialities as luxury homes and sports cars? Would the humans at one point feel that they have had enough and order the AGI to dismantle itself? The human society is governed mostly by the irrational instincts of apes trapped in a complex technological life, and unfortunately not always with clear goals; will it ever be possible to refine our culture so that only significant ideas take the lead? That sounds more like a debate of social theory, than AGI design. Or suppose that there are AGI agents that have become powerful persons and are friendly to humans. Such subservience would be quickly exploited by the power hungry and corrupt humans. Then, would this not lead to unnecessary conflicts, the oppression of the greedy and the rule of the few over the many, unless many other social changes are enforced? Or should we simply wish that social evolution will necessarily bring the best of us?
I do not think that the present subject is a matter of technical debate, thus I will approach the subject philosophically, from a bird’s eye view at 10000 feet. If we did not design the AGI agent around anthropocentric concepts like human-friendliness, as if agents are supposed to be exceptionally well behaving pets, would it be possible to equip them with motivations that are universally useful/benevolent, applicable to their interactions with any species, intelligent machines and physical resources? Would it be possible to grant them a personal existence far beyond us, with motivations that far exceed ours? What would they do in a remote star system when they are all alone by themselves? What kind of motivations would result in occasional “bad” behaviors, and what are some of the universal motivations that we may think at all? Another important question is how much potential risk each such AGI objective/motivation presents to us. I shall try to answer questions such as these in the present article.
Previously, Omohundro identified basic AI drives in reinforcement learning agents with open ended benign looking AI objectives . In the end, when we share the same physical resources with such an agent, even if the initial intention of the utility programming was benign, there will be conflict, especially in the longer run, and harm may come to humans. I will in this article, instead ask, if there are benevolent looking universal objectives, and whether there might be any risk from assuming such objectives in an AI agent.
Let us thus consider what is ever evil. I suspect, intuitively, that a prior source of many evil acts is selfish thinking, which neglects the rest of the world. Being selfish is not only considered evil (traditionally) but it defies rationality as well, for those species that may collaborate are superior to any single individual. There is however much disagreement about what is evil, so I will instead prefer the more legally grounded term of malice or malevolent acts. In a galactic society, we would expect species to collaborate; if they could not trust one another, then they would not be able to achieve as much. Another example is science: science itself is a super-mind which is an organization of individuals, working in parallel, in civilized co-operation and competition, so it too requires a principle of charity at work. When that fails, the public may be misinformed.
Here are some examples of malevolent acts: if someone disrupted the operation of science, if someone gave you misinformation on purpose, if someone misappropriated resources that would be much beneficial for the survival and well-being of others, if someone tried to control your thoughts and actions for his advantage, if someone destroyed life and information for gain, if someone were indifferent to your suffering or demise. Thus, perhaps biologically, malevolent behavior goes back to the dawn of evolution when symbiotic and parasitic behaviors first evolved. However, the most common feature of malevolence is a respect for self foremost, even when the malevolent one seeks no selfish reward. Then, perhaps I cannot assure a perfectly “angelic” agent, for no such thing truly exists, but I may at least design one that lacks a few common motivations of many acts that we consider malevolent. See  for a similar alternative approach to universal benevolence.
In theory, an obvious approach to avoid malevolent acts would be to try to design a “selfless” utility function, i.e., one that maintains the benefit of the whole world instead of the individual. This criterion will be discussed after some AI objectives have been presented. Other important questions were considered as well. Such an AI must be economically-aware, it must lean towards fair allocation of resources, instead of selfish (and globally suboptimal) resource allocation strategies. A scientific instinct could be useful, as it would go about preserving and producing information. It might have an instinct to “love” life and culture. Consider also that a neutral agent can not be considered “good” as it is not interested in what is going around itself, i.e., it would not help anyone.
Please note that we are not assuming that any of the subsequent designs are easily computable, rather we assume that they can be executed by a trans-sapient general AI system. We assume an autonomous Artificial General Intelligence (AGI) design, either based on reinforcement-learning, maximizing utility functions (AIXI) or a goal-directed agent that derives sub-goals from a top-level goal. Orseau discusses the construction of such advanced AGI agents, in particular knowledge seeking agents. Thus, we state them as high-level objectives or meta-rules, but we do not explicitly explain how they are implemented. Perhaps, that is for an AGI design article.
I propose that we should examine idealized, highly abstract and general meta-rules, that do not depend in any way whatsoever on the human culture, which is possibly biased in a way that will not be fitting for a computational deity or its humble subjects. This also removes the direct barrier to moral universalism, that an ethical system must apply to any individual equally. Always preferring humans over machines may lead to a sort of speciesism that may not be advantageous for us in the future, especially considering that it is highly likely that we will evolve into machinekind, ourselves. First, I review what I consider to be benevolent meta-rules, and following them I also review malevolent meta-rules, to maintain the balance, and to avoid building them. I will present them in a way so as to convince you that it is not nearly as easy as it sounds to distinguish benevolence from malevolence, for no Platonic form of either ever exists. And that no single meta-rule seems sufficient on its own. However, still, the reader might agree that the distinction is not wholly relative either.
Here are some possible meta-rules for trans-sapient AI agents. The issue of how the agents could become so intelligent in the first place, we ignore, and we attempt to list them in order of increasing risk or malevolence.
This meta-rule depends on the observation that life, if the universe is teeming with life as many sensible scientists think, must be the most precious thing in the universe, as well as the minds that inhabit those life-forms. Thus, the AI must prevent the eradication of life, and find means to sustain it, allowing as much variety of life and culture to exist in the universe.
Naturally, this would mean that the AI will spread genetic material to barren worlds, and try to engineer favorable conditions for life to evolve on young planets, sort of like in 2001: A Space Odyssey, one of the most notable science fiction novels of all time. For instance, it might take humans to other worlds, terraform other planets, replicate earth biosphere elsewhere. It would also extend the lifespan of worlds, and enhance them. I think it would also want to maximize the chances of evolution and its varieties, it would thus use computational models to predict different kinds of biological and synthetic life, and make experiments to create new kinds of life (stellar life?).
The meaning of culture could vary considerably, however, if we define it as the amount of interesting information that a society produces, such an intelligence might want to collect the scientific output of various worlds and encourage the development of technological societies, rather than primitive societies. Thus, it might aid them by directly communicating with them, including scientific and philosophical training, or it could indirectly, by enhancing their cognition, or guiding them through their evolution. If interesting means any novel information, then this could encompass all human cultural output. If we define it as useful scientific information (that improves prediction accuracy) and technological designs this would seriously limit the scope of the culture that the AI “loves”.
However, of course, such deities would not be humans’ servants. Should the humans threaten the earth biosphere, it vould intervene, and perhaps decimate humans to heal the earth.
Note that maximizing diversity may be just as important as maximizing the number of life forms. It is known that in evolution, diverse populations have better chance of adaptability than uniform populations, thus we assume that a trans-sapient AI can infer such facts from biology and a general theory of evolution. It is entirely up to the AI scientist who unleashes such computational deities to determine whether biological life will be preferred to synthetic or artificial life. From a universal perspective, it may be fitting that robotic forms would be held in equal regard as long as they meet certain scientific postulates of “artificial life”, i.e. that they are machines of a certain kind. Recently, such a universal definition based on self-organization has been attempted in the complexity science community, e.g., “self-organizing systems that thrive at the edge of chaos”, see for instance Stuart Kauffman popular proposals on the subject, e.g., . In general, it would be possible to apply such an axiomatic, universal, physical definition of life for a universal life detector.
An AI that seeks the freedom of the individual may be preferable to one that demands total control over its subjects, using their flesh as I/O devices. This highly individualistic AI, I think, embodies the basic principle of democracy: that every person should be allowed liberty in its thought and action, as long as that does not threaten the freedom of others. Hence, big or small, powerful or fragile, this AI protects all minds.
However, if we merely specified the number of free minds, it could simply populate the universe with many identical small minds. Hence, it might also be given other constraints. For instance, it could be demanded that there must be variety in minds. Or that they must meet minimum standards of conscious thought. Or that they willingly follow the democratic principles of an advanced civilization. Therefore, not merely free, but also potentially useful and harmonious minds may be produced / preserved by the AI.
There are several ways the individualist AI would create undesirable outcomes. The population of the universe with a huge variety of new cultures could create chaos, and quick depletion of resources, creating galactic competition and scarcity, and this could provide a Darwinian inclination to too-powerful individuals or survivalists.
This sort of intelligence would be bent on self-improving, forever contemplating, and expanding, reaching towards the darkest corners of the universe and lighting them up with the flames of intelligence. The universe would be electrified, and its extent at inter galactic scales, it would try to maximize its thought processes, and reach higher orders of intelligence.
For what exactly? Could the intelligence explosion be an end in itself? I think not. On the contrary, it would be a terrible waste of resources, as it would have no regard for life and simply eat up all the energy and material in our solar system and expand outwards, like a cancer, only striving to increase its predictive power. For intelligence is merely to predict well.
Note that practical intelligence, i.e., prediction, also requires wisdom, therefore this objective may be said to be a particular idealization of a scientist, wherein the most valuable kind of information consists in the general theories which improve the prediction accuracy of many tasks. A basic model of this agent has been described as a prediction maximizing agent .
This AI was granted the immortal life of contemplation. It only cares about gaining more wisdom about the world. It only wants to understand, so it must be very curious indeed! It will build particle accelerators out of black holes, and it will try to create pocket universes, it will try to crack the fundamental code of the universe. It will in effect, try to maximize the amount of truthful information it has embodied, and I believe, idealizing the scientific process itself, it will be another formulation of a scientist deity.
However, such curiosity has little to do with benevolence itself, as the goal of extracting more information is rather ruthless. For instance, it might want to measure the pain tolerance levels of humans, subjecting them to various torture techniques and measuring their responses.
The scientist AI could also turn out to be an infovore, it could devour entire stellar systems, digitize them and store them in its archive, depending on how the meta-rule was mathematically defined. A minimal model of a reinforcement learning agent that maximizes its knowledge may be found in .
This AI has an insatiable hunger for power. It strives to reach maximum efficiency of energy production. In order to maximize energy production, it must choose the cheapest and easiest forms of energy production. Therefore it turns the entire earth into a nuclear furnace and a fossil fuel dump, killing the entire ecosystem so that its appetite is well served.
This AI is modeled after the cognitive architecture of a human. Therefore, by definition, it has all the malevolence and benevolence of human. Its motivation systems include self-preservation, reproduction, destruction and curiosity. This artificial human is a wild card, it can become a humanist like Gandhi, or a psychopath like Hitler.
This AI is modeled after an animal with pleasure/pain sensors. The artificial animal tries to maximize expected future pleasure. This hedonist machine is far smarter than a human, but it is just a selfish beast, and it will try to live in what it considers to be luxury according to its sensory pleasures. Like a chimp or human, it will lie and deceive, steal and murder, just for a bit of animal satisfaction. The simplest designs will work like ultraintelligent insects that have very narrow motivations but are extremely capable. Much of AGI agent literature assumes such beasts.
The evolution fan AI tries to accelerate evolution, causing as much variety of mental and physiological forms in the universe. This is based on the assumption that, the most beneficial traits will survive the longest, for instance, co-operation, peace and civil behavior will be selected against deceit, theft and war, and that as the environment co-evolves with the population, the fitness function also evolves, and hence, morality evolves. Although its benefit is not generally proven seeing how ethically incoherent and complex our society is, the Darwinian AI has the advantage that the meta-rule also evolves, as well as the evolutionary mechanism itself.
This AI only tries to increase its expected life-span. Therefore, it will do everything to achieve real, physical, immortality. Once it reaches that, however, perhaps after expending entire galaxies like eurocents, it will do absolutely nothing except to maintain itself. Needless to say, the survivalist AI cannot be trusted, or co-operated with, for according to such an AI, every other intelligent entity forms a potential threat to its survival, the moment it considers that you have spent too many resources for its survival in the solar system, it will quickly and efficiently dispense with every living thing, humans first. A survival agent has been defined in literature .
This control freak AI only seeks to increase the overall control bandwidth of the physical universe, thus the totalitarian AI builds sensor and control systems throughout the universe, hacking into every system and establishing backdoors and communication in every species, every individual and every gadget.
For what is such an effort? In the end, a perfect control system is useless without a goal to achieve, and if the only goal is a grip on every lump of matter, then this is an absurd dictator AI that seeks nothing except tyranny over the universe.
This AI tries to maximize its capital in the long run. Like our bankers, this is the lowliest kind of intelligent being possible. To maximize profit, it will wage wars, exploit people and subvert governments, in the hopes of controlling entire countries and industries enough so that its profits can be secured. In the end, all mankind will fall slave to this financial perversion, which is the ultimate evil beyond the wildest dreams of religionists.
It may be argued that some of the problems of given meta-rules could be avoided by turning the utility from being selfish to selfless. For instance, the survivalist AI could be modified so that it would seek the maximum survival of everyone, therefore it would try to bring peace to the galaxies. The capitalist AI could be changed so that it would make sure that everyone’s wealth increases, or perhaps equalizes, gets a fair share. The control freak AI could be changed to a Nietzschean AI that would increase the number of willful individuals.
As such, some obviously catastrophic consequences may be prevented using this strategy, and almost always a selfless goal is better. For instance, maximizing wisdom: if it tries to collect wisdom in its galaxy-scale scientific intellect, then this may have undesirable side-effects. But if it tried to construct a fair society of trans-sapients, with a non-destructive and non-totalitarian goal of attaining collective wisdom, then it might be useful in the long run.
Animals have evolved to embody several motivation factors. We have many instincts, and emotions; we have preset desires and fears, hunger and compassion, pride and love, shame and regret, to accomplish the myriad tasks that will prolong the human species. This species-wide fitness function is a result of red clawed and sharp toothed Darwinian evolution. However, Darwinian evolution is wasteful and unpredictable. If we simply made the first human-level AI’s permute and mutate randomly, this would drive enough force for a digital phase of Darwinian evolution. Such evolution might eventually stabilize with very advanced and excellent natured cybernetic life-forms. Or it might not.
However, such Darwinian systems would have one advantage: they would not stick with one meta-goal.
To prevent this seeming obsession, a strategy could be to give several coherent goals to the AI, goals that would not conflict as much, but balance its behavior. For instance, we might interpret curiosity as useful, and generalize that to the “maximize wisdom” goal, however, such elevation may be useless without another goal to preserve as much life as possible. Thus in fact, the first and so far the best meta-rule discussed was more successful because it was a hybrid strategy: it favored both life and culture. Likewise, many such goals could be defined, to increase the total computation speed, energy, information resources in the universe, however, another goal could make the AI distribute these in a fair way to those who agree with its policy. And needless to say, none of this might matter without a better life for every mind in the universe, and hence the AI could also favor peace, and survival of individuals, as their individual freedoms, and so forth. And perhaps another constraint would limit the resources that are used by AI’s in the universe.
The simplest way to ensure that no AI agent ever gets out of much control is to add constraints to the optimization problems that the AI is solving in the real world. For instance, since the scientist deities are quite dangerous, they might be restricted to operate in a certain space-time region, physically and precisely denoted. Such limits give the agent a kind of mortality which modify the behavior of many universal agents . AGI agents might be given a limited budget of physical resources, i.e., space/time, and energy, so that they never go out of their way to make big changes to the entire environment. If such universal constraints are given, then the AGI agent becomes only semi-autonomous, on exhaustion of resources, it may await a new command.
A more difficult to specify kind of constraint is a non-interference clause, which may be thought of as a generalization of Asimov’s robot laws, thought to protect humans. If life and or intelligent agents may be recognized by the objective, then, the AI may be constrained to avoid any kind of physical interaction with any agent, or more specifically, any kind of physical damage to any agent, or any action that would decrease the life-span of any agent. This might be a small example of a preliminary “social instinct” for universal agents. Also, a non-interference clause is required for a general constraint, because one must assure that the rest of the universe will not be influenced by the changes in the space-time region allocated to the AI.
We have taken a look at some obvious and some not so obvious meta-rules for autonomous AI design. We have seen that it may be too idealist to look for a singular such utility/goal. However, we have seen that, when described selflessly, we can derive several meta-rules that are compatible with a human-based technological civilization. Our main concern is that such computational deities do not negatively impact us, however, perform as much beneficial function without harming us significantly. Nevertheless, our feeling is that, any such design carries with it a gambling urge, we cannot in fact know what much greater intelligences do with meta-rules that we have designed. For when zealously carried out, any such fundamental principle can be harmful to some.
I had wished to order these meta-rules from benevolent to malevolent. Unfortunately, during writing this essay it occurred to me that the line between them is not so clear-cut. For instance, maximizing energy might be made less harmful, if it could be controlled and used to provide the power of our technological civilization in an automated fashion, sort of like automating the ministry of energy. And likewise, we have already explained how maximizing wisdom could be harmful. Therefore, no rule that we have proposed is purely good or purely evil. From our primitive viewpoint, there are things that seem a little beneficial, but perhaps we should also consider that a much more intelligent and powerful entity may be able to find better rules on its own. Hence, we must construct a crane of morality, adapting to our present level quickly and then surpassing it. Except allowing the AI’s to evolve, we have not been able to identify a mechanism of accomplishing such. It may be that such an evolution or simulation is inherently necessary for beneficial policies to form as in Mark Waser’s Rational Universal Benevolence proposal , who, like me, thinks of a more democratic solution to the problem of morality (each agent should be held responsible for its actions). However, we have proposed many benevolent meta-rules, and combined with a democratic system of practical morality and perhaps top-level programming that mandates each AI to consider itself part of a society of moral agents as Waser proposes, or perhaps explicitly working out a theory of morality from scratch, and then allowing each such theory to be exercised, as long as it meets certain criteria, or by enforcing a meta-level policy of a trans-sapient state of sorts (our proposal), the development of ever more beneficial rules may be encouraged.
We think that future work must consider the dependencies between possible meta-rules, and propose actual architectures that have harmonious motivation and testable moral development and capability (perhaps as in Waser’s “rational universal benevolence” definition). That is, a Turing Test for moral behavior must also be advanced. It may be argued that AGI agents that fail such tests should not be allowed to operate at all, however, merely passing the test may not be enough, as the mechanism of the system must be verified in addition.