There are thorny problems with the legacy principle as well. Given how ethical views have evolved since the Middle Ages regarding slavery, women’s rights, etc., would we really want people from 1,500 years ago to have a lot of influence over how today’s world is run? If not, why should we try to impose our ethics on future beings that may be dramatically smarter than us? Are we really confident that superhuman AGI would want what our inferior intellects cherish? This would be like a four-year-old imagining that once she grows up and gets much smarter, she’s going to want to build a gigantic gingerbread house where she can spend all day eating candy and ice cream. Like her, life on Earth is likely to outgrow its childhood interests. Or imagine a mouse creating human-level AGI, and figuring it will want to build entire cities out of cheese. On the other hand, if we knew that superhuman AI would one day commit cosmocide and extinguish all life in our Universe, why should today’s humans agree to this lifeless future if we have the power to prevent it by creating tomorrow’s AI differently?
In conclusion, it’s tricky to fully codify even widely accepted ethical principles into a form applicable to future AI, and this problem deserves serious discussion and research as AI keeps progressing. In the meantime, however, let’s not let perfect be the enemy of good: there are many examples of uncontroversial “kindergarten ethics” that can and should be built into tomorrow’s technology. For example, large civilian passenger aircraft shouldn’t be allowed to fly into stationary objects, and now that virtually all of them have autopilot, radar and GPS, there are no longer any valid technical excuses. Yet the September 11 hijackers flew three planes into buildings and suicidal pilot Andreas Lubitz flew Germanwings Flight 9525 into a mountain on March 24, 2015—by setting the autopilot to an altitude of 100 feet (30 meters) above sea level and letting the flight computer do the rest of the work. Now that our machines are getting smart enough to have some information about what they’re doing, it’s time for us to teach them limits. Any engineer designing a machine needs to ask if there are things that it can but shouldn’t do, and consider whether there’s a practical way of making it impossible for a malicious or clumsy user to cause harm.
Ultimate Goals?
This chapter has been a brief history of goals. If we could watch a fast-forward replay of our 13.8-billion-year cosmic history, we’d witness several distinct stages of goal-oriented behavior:
1. Matter seemingly intent on maximizing its dissipation
2. Primitive life seemingly trying to maximize its replication
3. Humans pursuing not replication but goals related to pleasure, curiosity, compassion and other feelings that they’d evolved to help them replicate
4. Machines built to help humans pursue their human goals
If these machines eventually trigger an intelligence explosion, then how will this history of goals ultimately end? Might there be a goal system or ethical framework that almost all entities converge to as they get ever more intelligent? In other words, do we have an ethical destiny of sorts?
A cursory reading of human history might suggest hints of such a convergence: in his book The Better Angels of Our Nature, Steven Pinker argues that humanity has been getting less violent and more cooperative for thousands of years, and that many parts of the world have seen increasing acceptance of diversity, autonomy and democracy. Another hint of convergence is that the pursuit of truth through the scientific method has gained in popularity over past millennia. However, it may be that these trends show convergence not of ultimate goals but merely of subgoals. For example, figure 7.2 shows that the pursuit of truth (a more accurate world model) is simply a subgoal of almost any ultimate goal. Similarly, we saw above how ethical principles such as cooperation, diversity and autonomy can be viewed as subgoals, in that they help societies function efficiently and thereby help them survive and accomplish any more fundamental goals that they may have. Some may even dismiss everything we call “human values” as nothing but a cooperation protocol, helping us with the subgoal of collaborating more efficiently. In the same spirit, looking ahead, it’s likely that any superintelligent AIs will have subgoals including efficient hardware, efficient software, truth-seeking and curiosity, simply because these subgoals help them accomplish whatever their ultimate goals are.
Indeed, Nick Bostrom argues strongly against the ethical destiny hypothesis in his book Superintelligence, presenting a counterpoint that he terms the orthogonality thesis: that the ultimate goals of a system can be independent of its intelligence. By definition, intelligence is simply the ability to accomplish complex goals, regardless of what these goals are, so the orthogonality thesis sounds quite reasonable. After all, people can be intelligent and kind or intelligent and cruel, and intelligence can be used for the goal of making scientific discoveries, creating beautiful art, helping people or planning terrorist attacks.8
The orthogonality thesis is empowering by telling us that the ultimate goals of life in our cosmos aren’t predestined, but that we have the freedom and power to shape them. It suggests that guaranteed convergence to a unique goal is to be found not in the future but in the past, when all life emerged with the single goal of replication. As cosmic time passes, ever more intelligent minds get the opportunity to rebel and break free from this banal replication goal and choose goals of their own. We humans aren’t fully free in this sense, since many goals remain genetically hardwired into us, but AIs can enjoy this ultimate freedom of being fully unfettered from prior goals. This possibility of greater goal freedom is evident in today’s narrow and limited AI systems: as I mentioned earlier, the only goal of a chess computer is to win at chess, but there are also computers whose goal is to lose at chess and which compete in reverse chess tournaments where the goal is to force the opponent to capture your pieces. Perhaps this freedom from evolutionary biases can make AIs more ethical than humans in some deep sense: moral philosophers such as Peter Singer have argued that most humans behave unethically for evolutionary reasons, for example by discriminating against non-human animals.
We saw that a cornerstone in the “friendly-AI” vision is the idea that a recursively self-improving AI will wish to retain its ultimate (friendly) goal as it gets more intelligent. But how can an “ultimate goal” (or “final goal,” as Bostrom calls it) even be defined for a superintelligence? The way I see it, we can’t have confidence in the friendly-AI vision unless we can answer this crucial question.
In AI research, intelligent machines typically have a clear-cut and well-defined final goal, for instance to win the chess game or drive the car to the destination legally. The same holds for most tasks that we assign to humans, because the time horizon and context are known and limited. But now we’re talking about the entire future of life in our Universe, limited by nothing but the (still not fully known) laws of physics, so defining a goal is daunting! Quantum effects aside, a truly well-defined goal would specify how all particles in our Universe should be arranged at the end of time. But it’s not clear that there exists a well-defined end of time in physics. If the particles are arranged in that way at an earlier time, that arrangement will typically not last. And what particle arrangement is preferable, anyway?
We humans tend to prefer some particle arrangements over others; for example, we prefer our hometown arranged as it is over having its particles rearranged by a hydrogen bomb explosion. So suppose we try to define a goodness function that associates a number with every possible arrangement of the particles in our Universe, quantifying how “good” we think this arrangement is, and then give a superintelligent AI the goal of maximizing this function. This may sound like a reasonable approach, since describing goal-oriented behavior as function maximization is popular in other areas of science: for example, economists often model people as trying to maximize what they call a “utility function,” and many AI designers train their intelligent agents to maximize what they call a “reward function.” When we’re taking about the ultimate goals for our cosmos, however, this approach poses a computational nightmare, since it would need to define a goodness value for every one of more than a googolplex possible arrangements of the elementary particles in our Universe, where a googolplex is 1 followed by 10 100zeroes—more zeroes than there are particles in our Universe. How would we define this goodness function to the AI?
Читать дальше