Even more interesting, the process doesn’t end when you find a car, a house, a doctor, a date, or a job. Your digital half is continually learning from its experiences, just as you would. It figures out what works and doesn’t, whether it’s in job interviews, dating, or real-estate hunting. It learns about the people and organizations it interacts with on your behalf and then (even more important) from your real-world interactions with them. It predicted Alice would be a great date for you, but you had an awkward time, so it hypothesizes possible reasons, which it will test on your next round of dating. It shares its most important findings with you. (“You believe you like X, but in reality you tend to go for Y.”) Comparing your experiences of various hotels with their reviews on TripAdvisor, it figures out what the really telling tidbits are and looks for them in the future. It learns not just which online merchants are more trustworthy but how to decode what the less trustworthy ones say. Your digital half has a model of the world: not just of the world in general but of the world as it relates to you. At the same time, of course, everyone else also has a continually evolving model of his or her world. Every party to an interaction learns from it and applies what it’s learned to its next interactions. You have your model of every person and organization you interact with, and they each have their model of you. As the models improve, their interactions become more and more like the ones you would have in the real world-except millions of times faster and in silicon. Tomorrow’s cyberspace will be a vast parallel world that selects only the most promising things to try out in the real one. It will be like a new, global subconscious, the collective id of the human race.
To share or not to share, and how and where
Of course, learning about the world all by yourself is slow, even if your digital half does it orders of magnitude faster than the flesh-and-blood you. If others learn about you faster than you learn about them, you’re in trouble. The answer is to share: a million people learn about a company or product a lot faster than a single one does, provided they pool their experiences. But who should you share data with? That’s perhaps the most important question of the twenty-first century.
Today your data can be of four kinds: data you share with everyone, data you share with friends or coworkers, data you share with various companies (wittingly or not), and data you don’t share. The first type includes things like Yelp, Amazon, and TripAdvisor reviews, eBay feedback scores, LinkedIn résumés, blogs, tweets, and so on. This data is very valuable and is the least problematic of the four. You make it available to everyone because you want to, and everyone benefits. The only problem is that the companies hosting the data don’t necessarily allow it to be downloaded in bulk for building models. They should. Today you can go to TripAdvisor and see the reviews and star ratings of particular hotels you’re considering, but what about a model of what makes a hotel good or bad in general, which you could use to rate hotels that currently have few or no reliable reviews? TripAdvisor could learn it, but what about a model of what makes a hotel good or bad for you ? This requires information about you that you may not want to share with TripAdvisor. What you’d like is a trusted party that combines the two types of data and gives you the results.
The second kind of data should also be unproblematic, but it isn’t because it overlaps with the third. You share updates and pictures with your friends on Facebook, and they with you. But everyone shares their updates and pictures with Facebook. Lucky Facebook: it has a billion friends. Day by day, it learns a lot more about the world than any one person does. It would learn even more if it had better algorithms, and they are getting better every day, courtesy of us data scientists. Facebook’s main use for all this knowledge is to target ads to you. In return, it provides the infrastructure for your sharing. That’s the bargain you make when you use Facebook. As its learning algorithms improve, it gets more and more value out of the data, and some of that value returns to you in the form of more relevant ads and better service. The only problem is that Facebook is also free to do things with the data and the models that are not in your interest, and you have no way to stop it.
This problem pops up across the board with data you share with companies, which these days includes pretty much everything you do online as well as a lot of what you do offline. In case you haven’t noticed, there’s a mad race to gather data about you. Everybody loves your data, and no wonder: it’s the gateway to your world, your money, your vote, even your heart. But everyone has only a sliver of it. Google sees your searches, Amazon your online purchases, AT &T your phone calls, Apple your music downloads, Safeway your groceries, Capital One your credit-card transactions. Companies like Acxiom collate and sell information about you, but if you inspect it (which in Acxiom’s case you can, at aboutthedata.com), it’s not much, and some of it is wrong. No one has anything even approaching a complete picture of you. That’s both good and bad. Good because if someone did, they’d have far too much power. Bad because as long as that’s the case there can be no 360-degree model of you. What you really want is a digital you that you’re the sole owner of and that others can access only on your terms.
The last type of data-data you don’t share-also has a problem, which is that maybe you should share it. Maybe it hasn’t occurred to you to do so, maybe there’s no easy way to, or maybe you just don’t want to. In the latter case, you should consider whether you have an ethical responsibility to share. One example we’ve seen is cancer patients, who can contribute to curing cancer by sharing their tumors’ genomes and treatment histories. But it goes well beyond that. All sorts of questions about society and policy can potentially be answered by learning from the data we generate in our daily lives. Social science is entering a golden age, where it finally has data commensurate with the complexity of the phenomena it studies, and the benefits to all of us could be enormous-provided the data is accessible to researchers, policy makers, and citizens. This does not mean letting others peek into your private life; it means letting them see the learned models, which should contain only statistical information. So between you and them there needs to be an honest data broker that guarantees your data won’t be misused, but also that no free riders share the benefits without sharing the data.
In sum, all four kinds of data sharing have problems. These problems all have a common solution: a new type of company that is to your data what your bank is to your money. Banks don’t steal your money (with rare exceptions). They’re supposed to invest it wisely, and your deposits are FDIC-insured. Many companies today offer to consolidate your data somewhere in the cloud, but they’re still a far cry from your personal data bank. If they’re cloud providers, they try to lock you in-a big no-no. (Imagine depositing your money with Bank of America and not knowing if you’ll be able to transfer it to Wells Fargo somewhere down the line.) Some startups offer to hoard your data and then mete it out to advertisers in return for discounts, but to me that misses the point. Sometimes you want to give information to advertisers for free because it’s in your interests, sometimes you don’t want to give it at all, and what to share when is a problem that only a good model of you can solve.
The kind of company I’m envisaging would do several things in return for a subscription fee. It would anonymize your online interactions, routing them through its servers and aggregating them with its other users’. It would store all the data from all your life in one place-down to your 24/7 Google Glass video stream, if you ever get one. It would learn a complete model of you and your world and continually update it. And it would use the model on your behalf, always doing exactly what you would, to the best of the model’s ability. The company’s basic commitment to you is that your data and your model will never be used against your interests. Such a guarantee can never be foolproof-you yourself are not guaranteed to never do anything against your interests, after all. But the company’s life would depend on it as much as a bank’s depends on the guarantee that it won’t lose your money, so you should be able to trust it as much as you trust your bank.
Читать дальше