In those early days, Google lived at google.stanford.edu, and Brin and Page were convinced it should be nonprofit and advertising free. “We expect that advertising funded search engines will be inherently biased towards the advertisers and away from the needs of the consumers,” they wrote. “The better the search engine is, the fewer advertisements will be needed for the consumer to find what they want…. We believe the issue of advertising causes enough mixed incentives that it is crucial to have a competitive search engine that is transparent and in the academic realm.”
But when they released the beta site into the wild, the traffic chart went vertical. Google worked—out of the box, it was the best search site on the Internet. Soon, the temptation to spin it off as a business was too great for the twenty-something cofounders to bear.
In the Google mythology, it is PageRank that drove the company to worldwide dominance. I suspect the company likes it that way—it’s a simple, clear story that hangs the search giant’s success on a single ingenious breakthrough by one of its founders. But from the beginning, PageRank was just a small part of the Google project. What Brin and Page had really figured out was this: The key to relevance, the solution to sorting through the mass of data on the Web was… more data.
It wasn’t just which pages linked to which that Brin and Page were interested in. The position of a link on the page, the size of the link, the age of the page—all of these factors mattered. Over the years, Google has come to call these clues embedded in the data signals.
From the beginning, Page and Brin realized that some of the most important signals would come from the search engine’s users. If someone searches for “Larry Page,” say, and clicks on the second link, that’s another kind of vote: It suggests that the second link is more relevant to that searcher than the first one. They called this a click signal. “Some of the most interesting research,” Page and Brin wrote, “will involve leveraging the vast amount of usage data that is available from modern web systems…. It is very difficult to get this data, mainly because it is considered commercially valuable.” Soon they’d be sitting on one of the world’s largest stores of it.
Where data was concerned, Google was voracious. Brin and Page were determined to keep everything: every Web page the search engine had ever landed on, every click every user ever made. Soon its servers contained a nearly real-time copy of most of the Web. By sifting through this data, they were certain they’d find more clues, more signals, that could be used to tweak results. The search-quality division at the company acquired a black-ops kind of feel: few visitors and absolute secrecy were the rule.
“The ultimate search engine,” Page was fond of saying, “would understand exactly what you mean and give back exactly what you want.” Google didn’t want to return thousands of pages of links—it wanted to return one, the one you wanted. But the perfect answer for one person isn’t perfect for another. When I search for “panthers,” what I probably mean are the large wild cats, whereas a football fan searching for the phrase probably means the Carolina team. To provide perfect relevance, you’d need to know what each of us was interested in. You’d need to know that I’m pretty clueless about football; you’d need to know who I was.
The challenge was getting enough data to figure out what’s personally relevant to each user. Understanding what someone means is tricky business—and to do it well, you have to get to know a person’s behavior over a sustained period of time.
But how? In 2004, Google came up with an innovative strategy. It started providing other services, services that required users to log in. Gmail, its hugely popular e-mail service, was one of the first to roll out. The press focused on the ads that ran along Gmail’s sidebar, but it’s unlikely that those ads were the sole motive for launching the service. By getting people to log in, Google got its hands on an enormous pile of data—the hundreds of millions of e-mails Gmail users send and receive each day. And it could cross-reference each user’s e-mail and behavior on the site with the links he or she clicked in the Google search engine. Google Apps—a suite of online word-processing and spreadsheet-creation tools—served double duty: It undercut Microsoft, Google’s sworn enemy, and it provided yet another hook for people to stay logged in and continue sending click signals. All this data allowed Google to accelerate the process of building a theory of identity for each user—what topics each user was interested in, what links each person clicked.
By November 2008, Google had several patents for personalization algorithms—code that could figure out the groups to which an individual belongs and tailor his or her result to suit that group’s preference. The categories Google had in mind were pretty narrow: to illustrate its example in the patent, Google used the example of “all persons interested in collecting ancient shark teeth” and “all persons not interested in collecting ancient shark teeth.” People in the former category who searched for, say, “Great White incisors” would get different results from the latter.
Today, Google monitors every signal about us it can get its hands on. The power of this data can’t be underestimated: If Google sees that I log on first from New York, then from San Francisco, then from New York again, it knows I’m a bicoastal traveler and can adjust its results accordingly. By looking at what browser I use, it can make some guesses about my age and even perhaps my politics.
How much time you take between the moment you enter your query and the moment you click on a result sheds light on your personality. And of course, the terms you search for reveal a tremendous amount about your interests.
Even if you’re not logged in, Google is personalizing your search. The neighborhood—even the block—that you’re logging in from is available to Google, and it says a lot about who you are and what you’re interested in. A query for “Sox” coming from Wall Street is probably shorthand for the financial legislation “Sarbanes Oxley,” while across the Upper Bay in Staten Island it’s probably about baseball.
“People always make the assumption that we’re done with search,” said founder Page in 2009. “That’s very far from the case. We’re probably only 5 percent of the way there. We want to create the ultimate search engine that can understand anything…. Some people could call that artificial intelligence.”
In 2006, at an event called Google Press Day, CEO Eric Schmidt laid out Google’s five-year plan. One day, he said, Google would be able to answer questions such as “Which college should I go to?” “It will be some years before we can at least partially answer those questions. But the eventual outcome is… that Google can answer a more hypothetical question.”
Google’s algorithms were unparalleled, but the challenge was to coax users into revealing their tastes and interests. In February 2004, working out of his Harvard dorm room, Mark Zuckerberg came up with an easier approach. Rather than sifting through click signals to figure out what people cared about, the plan behind his creation, Facebook, was to just flat out ask them.
Since he was a college freshman, Zuckerberg had been interested in what he called the “social graph”—the set of each person’s relationships. Feed a computer that data, and it could start to do some pretty interesting and useful things—telling you what your friends were up to, where they were, and what they were interested in. It also had implications for news: In its earliest incarnation as a Harvard-only site, Facebook automatically annotated people’s personal pages with links to the Crimson articles in which they appeared.
Читать дальше