Especially once the second click has occurred, your brain is in on the act as well. Our brains act to reduce cognitive dissonance in a strange but compelling kind of unlogic—“Why would I have done x if I weren’t a person who does x —therefore I must be a person who does x. ”Each click you take in this loop is another action to self-justify—“Boy, I guess I just really love ‘Crazy Train.’” When you use a recursive process that feeds on itself, Cohler tells me, “You’re going to end up down a deep and narrow path.” The reverb drowns out the tune. If identity loops aren’t counteracted through randomness and serendipity, you could end up stuck in the foothills of your identity, far away from the high peaks in the distance.
And that’s when these loops are relatively benign. Sometimes they’re not.
We know what happens when teachers think students are dumb: They get dumber. In an experiment done before the advent of ethics boards, teachers were given test results that supposedly indicated the IQ and aptitude of students entering their classes. They weren’t told, however, that the results had been randomly redistributed among students. After a year, the students who the teachers had been told were bright made big gains in IQ. The students who the teachers had been told were below average had no such improvement.
So what happens when the Internet thinks you’re dumb? Personalization based on perceived IQ isn’t such a far-fetched scenario—Google Docs even offers a helpful tool for automatically checking the grade-level of written text. If your education level isn’t already available through a tool like Acxiom, it’s easy enough for anyone with access to a few e-mails or Facebook posts to infer. Users whose writing indicates college-level literacy might see more articles from the New Yorker; users with only basic writing skills might see more from the New York Post.
In a broadcast world, everyone is expected to read or process information at about the same level. In the filter bubble, there’s no need for that expectation. On one hand, this could be great—vast groups of people who have given up on reading because the newspaper goes over their heads may finally connect with written content. But without pressure to improve, it’s also possible to get stuck in a grade-three world for a long time.
In some cases, letting algorithms make decisions about what we see and what opportunities we’re offered gives us fairer results. A computer can be made blind to race and gender in ways that humans usually can’t. But that’s only if the relevant algorithms are designed with care and acuteness. Otherwise, they’re likely to simply reflect the social mores of the culture they’re processing—a regression to the social norm.
In some cases, algorithmic sorting based on personal data can be even more discriminatory than people would be. For example, software that helps companies sift through résumés for talent might “learn” by looking at which of its recommended employees are actually hired. If nine white candidates in a row are chosen, it might determine that the company isn’t interested in hiring black people and exclude them from future searches. “In many ways,” writes NYU sociologist Dalton Conley, “such network-based categorizations are more insidious than the hackneyed groupings based on race, class, gender, religion, or any other demographic characteristic.” Among programmers, this kind of error has a name. It’s called overfitting .
The online movie rental Web site Netflix is powered by an algorithm called CineMatch. To start, it was pretty simple. If I had rented the first movie in the Lord of the Rings trilogy, let’s say, Netflix could look up what other movies Lord of the Rings watchers had rented. If many of them had rented Star Wars, it’d be highly likely that I would want to rent it, too.
This technique is called kNN (k-nearest-neighbor), and using it CineMatch got pretty good at figuring out what movies people wanted to watch based on what movies they’d rented and how many stars (out of five) they’d given the movies they’d seen. By 2006, CineMatch could predict within one star how much a given user would like any movie from Netflix’s vast hundred-thousand-film emporium. Already CineMatch was better at making recommendations than most humans. A human video clerk would never think to suggest Silence of the Lambs to a fan of The Wizard of Oz, but CineMatch knew people who liked one usually liked the other.
But Reed Hastings, Netflix’s CEO, wasn’t satisfied. “Right now, we’re driving the Model-T version of what’s possible,” he told a reporter in 2006. On October 2, 2006, an announcement went up on the Netflix Web site: “We’re interested, to the tune of $1 million.” Netflix had posted an enormous swath of data—reviews, rental records, and other information from its user database, scrubbed of anything that would obviously identify a specific user. And now the company was willing to give $1 million to the person or team who beat CineMatch by more than 10 percent. Like the longitude prize, the Netflix Challenge was open to everyone. “All you need is a PC and some great insight,” Hastings declared in the New York Times.
After nine months, about eighteen thousand teams from more than 150 countries were competing, using ideas from machine learning, neural networks, collaborative filtering, and data mining. Usually, contestants in high-stakes contests operate in secret. But Netflix encouraged the competing groups to communicate with one another and built a message board where they could coordinate around common obstacles. Read through the message board, and you get a visceral sense of the challenges that bedeviled the contestants during the three-year quest for a better algorithm. Overfitting comes up again and again.
There are two challenges in building pattern-finding algorithms. One is finding the patterns that are there in all the noise. The other problem is the opposite: not finding patterns in the data that aren’t actually really there. The pattern that describes “1, 2, 3” could be “add one to the previous number” or “list positive prime numbers from smallest to biggest.” You don’t know for sure until you get more data. And if you leap to conclusions, you’re overfitting.
Where movies are concerned, the dangers of overfitting are relatively small—many analog movie watchers have been led to believe that because they liked The Godfather and The Godfather: Part II, they’ll like The Godfather: Part III. But the overfitting problem gets to one of the central, irreducible problems of the filter bubble: Overfitting and stereotyping are synonyms.
The term stereotyping (which in this sense comes from Walter Lippmann, incidentally) is often used to refer to malicious xenophobic patterns that aren’t true—“people of this skin color are less intelligent” is a classic example. But stereotypes and the negative consequences that flow from them aren’t fair to specific people even if they’re generally pretty accurate.
Marketers are already exploring the gray area between what can be predicted and what predictions are fair. According to Charlie Stryker, an old hand in the behavioral targeting industry who spoke at the Social Graph Symposium, the U.S. Army has had terrific success using social-graph data to recruit for the military—after all, if six of your Facebook buddies have enlisted, it’s likely that you would consider doing so too. Drawing inferences based on what people like you or people linked to you do is pretty good business. And it’s not just the army. Banks are beginning to use social data to decide to whom to offer loans: If your friends don’t pay on time, it’s likely that you’ll be a deadbeat too. “A decision is going to be made on creditworthiness based on the creditworthiness of your friends,” Stryker said. “There are applications of this technology that can be very powerful,” another social targeting entrepreneur told the Wall Street Journal. “Who knows how far we’d take it?”
Читать дальше