Nobody knows for sure what the next blockbuster computational substrate will be, but we do know that we’re nowhere near the limits imposed by the laws of physics. My MIT colleague Seth Lloyd has worked out what this fundamental limit is, and as we’ll explore in greater detail in chapter 6, this limit is a whopping 33 orders of magnitude (10 33times) beyond today’s state of the art for how much computing a clump of matter can do. So even if we keep doubling the power of our computers every couple of years, it will take over two centuries until we reach that final frontier.
Although all universal computers are capable of the same computations, some are more efficient than others. For example, a computation requiring millions of multiplications doesn’t require millions of separate multiplication modules built from separate transistors as in figure 2.6: it needs only one such module, since it can use it many times in succession with appropriate inputs. In this spirit of efficiency, most modern computers use a paradigm where computations are split into multiple time steps, during which information is shuffled back and forth between memory modules and computation modules. This computational architecture was developed between 1935 and 1945 by computer pioneers including Alan Turing, Konrad Zuse, Presper Eckert, John Mauchly and John von Neumann. More specifically, the computer memory stores both data and software (a program, i.e., a list of instructions for what to do with the data). At each time step, a central processing unit (CPU) executes the next instruction in the program, which specifies some simple function to apply to some part of the data. The part of the computer that keeps track of what to do next is merely another part of its memory, called the program counter, which stores the current line number in the program. To go to the next instruction, simply add one to the program counter. To jump to another line of the program, simply copy that line number into the program counter—this is how so-called “if” statements and loops are implemented.
Today’s computers often gain additional speed by parallel processing, which cleverly undoes some of this reuse of modules: if a computation can be split into parts that can be done in parallel (because the input of one part doesn’t require the output of another), then the parts can be computed simultaneously by different parts of the hardware.
The ultimate parallel computer is a quantum computer . Quantum computing pioneer David Deutsch controversially argues that “quantum computers share information with huge numbers of versions of themselves throughout the multiverse,” and can get answers faster here in our Universe by in a sense getting help from these other versions.4 We don’t yet know whether a commercially competitive quantum computer can be built during the coming decades, because it depends both on whether quantum physics works as we think it does and on our ability to overcome daunting technical challenges, but companies and governments around the world are betting tens of millions of dollars annually on the possibility. Although quantum computers cannot speed up run-of-the-mill computations, clever algorithms have been developed that may dramatically speed up specific types of calculations, such as cracking cryptosystems and training neural networks. A quantum computer could also efficiently simulate the behavior of quantum-mechanical systems, including atoms, molecules and new materials, replacing measurements in chemistry labs in the same way that simulations on traditional computers have replaced measurements in wind tunnels.
What Is Learning?
Although a pocket calculator can crush me in an arithmetic contest, it will never improve its speed or accuracy, no matter how much it practices. It doesn’t learn: for example, every time I press its square-root button, it computes exactly the same function in exactly the same way. Similarly, the first computer program that ever beat me at chess never learned from its mistakes, but merely implemented a function that its clever programmer had designed to compute a good next move. In contrast, when Magnus Carlsen lost his first game of chess at age five, he began a learning process that made him the World Chess Champion eighteen years later.
The ability to learn is arguably the most fascinating aspect of general intelligence. We’ve already seen how a seemingly dumb clump of matter can remember and compute, but how can it learn? We’ve seen that finding the answer to a difficult question corresponds to computing a function, and that appropriately arranged matter can calculate any computable function. When we humans first created pocket calculators and chess programs, we did the arranging. For matter to learn, it must instead rearrange itself to get better and better at computing the desired function—simply by obeying the laws of physics.
To demystify the learning process, let’s first consider how a very simple physical system can learn the digits of π and other numbers. Above we saw how a surface with many valleys (see figure 2.3) can be used as a memory device: for example, if the bottom of one of the valleys is at position x = π ≈ 3 . 14159 and there are no other valleys nearby, then you can put a ball at x = 3 and watch the system compute the missing decimals by letting the ball roll down to the bottom. Now, suppose that the surface is made of soft clay and starts out completely flat, as a blank slate. If some math enthusiasts repeatedly place the ball at the locations of each of their favorite numbers, then gravity will gradually create valleys at these locations, after which the clay surface can be used to recall these stored memories. In other words, the clay surface has learned to compute digits of numbers such as π .
Other physical systems, such as brains, can learn much more efficiently based on the same idea. John Hopfield showed that his above-mentioned network of interconnected neurons can learn in an analogous way: if you repeatedly put it into certain states, it will gradually learn these states and return to them from any nearby state. If you’ve seen each of your family members many times, then memories of what they look like can be triggered by anything related to them.
Neural networks have now transformed both biological and artificial intelligence, and have recently started dominating the AI subfield known as machine learning (the study of algorithms that improve through experience). Before delving deeper into how such networks can learn, let’s first understand how they can compute. A neural network is simply a group of interconnected neurons that are able to influence each other’s behavior. Your brain contains about as many neurons as there are stars in our Galaxy: in the ballpark of a hundred billion. On average, each of these neurons is connected to about a thousand others via junctions called synapses, and it’s the strengths of these roughly hundred trillion synapse connections that encode most of the information in your brain.
We can schematically draw a neural network as a collection of dots representing neurons connected by lines representing synapses (see figure 2.9). Real-world neurons are very complicated electrochemical devices looking nothing like this schematic illustration: they involve different parts with names such as axons and dendrites, there are many different kinds of neurons that operate in a wide variety of ways, and the exact details of how and when electrical activity in one neuron affects other neurons is still the subject of active study. However, AI researchers have shown that neural networks can still attain human-level performance on many remarkably complex tasks even if one ignores all these complexities and replaces real biological neurons with extremely simple simulated ones that are all identical and obey very simple rules. The currently most popular model for such an artificial neural network represents the state of each neuron by a single number and the strength of each synapse by a single number. In this model, each neuron updates its state at regular time steps by simply averaging together the inputs from all connected neurons, weighting them by the synaptic strengths, optionally adding a constant, and then applying what’s called an activation function to the result to compute its next state. *5The easiest way to use a neural network as a function is to make it feedforward, with information flowing only in one direction, as in figure 2.9, plugging the input to the function into a layer of neurons at the top and extracting the output from a layer of neurons at the bottom.
Читать дальше