Arguments for and against the use of "Big Data" to tailor services and advertisements litter the blogosphere, and many privacy advocates declare that using a person's data is a breach of privacy, regardless of the benefits, anonymity, and aggregation methods used. However, others argue that without this treasure trove of personal data, many of the tools society depends on would be inconceivable.
One of the most prolific examples is the Global Position System (GPS). While GPS was originally developed by the Department of Defense, it was made public for the public good after the tragic Korean Air Lines Flight 007. Since its release in 1983, the GPS that we utilize today is almost indistinguishable from its origins, even though the underlying mechanics have remained unchanged. This evolution of GPS was only made possible through the use of personal data collected from millions of Americans who speedily click through exhaustive End User License Agreements and Privacy Policies without reading them; as a result, many of these participants may be surprised to find how important they were in the evolution of GPS. This evolution has been referred to as different things over the years; the two most iconic being the "Geospatial Revolution" by Princeton and later "The Social Layer" by Google.
This layering of raw public data with the personal data of millions has led to some incredible technological break through that mirror that of GPS' evolution. These break through technologies would revolutionize the American way of life so completely that, without them, their sudden departure would make day-to-day life foreign to almost everyone within the country. Some of these technologies are springing up in clearly delineated and emerging markets that are easy enough to identify, but others are so transparent to the average American that their reliance on personal data may be shocking.
The invention of Uber, a ride sharing app, has given people the ability to turn their idle commutes into fare generating commercial trips. Numerous supporting apps, such as SherpaShare, have been created to help create layered data that aims to increase the effectiveness of the Uber drivers.
Arguably more important than the creation of Uber (or its supporting applications), is the market that it created. A competing company, Lyft, found its way to prominence shortly after, and a similarly themed company aimed at renting unused guest bedrooms in your home named airbnb was seen shortly thereafter. These companies could not exist without this layered private data that help these drivers, hostesses, and other participants in the "sharing economy" make financially sound decisions based on the travel, shopping, and tourism habits of millions of people throughout the country.
Even more surprising to the average American is that the first responder who may one day save your life, recover a family heirloom, or provide you with toiletries after a natural disaster may rely so heavily on this layered data that their job would be nearly impossible without it.
In fact, through the use of applications like Ushahidi, incident commanders are able to work more closely with volunteers and news agencies to aggregate and disseminate vital intelligence to first responders who are then able to render life saving aid quicker, and to national decision makers and news agencies who are able to keep the country informed and engaged with the rescue efforts. The result is increased awareness for avoidable disasters, like the disaster on Deepwater Horizon, through graphics from New York Times and other agencies, who then hold those responsible accountable for their actions.
However, these revolutions aren't without consequences. In one prolific example, captured by Charles Duhigg in his book The Power of Habit, the national retailer Target predicts the pregnancy of, and sends relevant advertisements to, a teenage girl at such an early stage of her pregnancy that her family, friends, and boyfriend had not yet been informed of the new development. The situation caused such an uproar among privacy advocates and those against general 'creepiness' of the situation, that Target artificially diluted the accuracy of its algorithms in order to prevent alienating future customers.
I pick on Target because predicting that someone is pregnant before the father is fairly dramatic, but Target isn't the only company doing this, nor are they the only ones who are good at it. There are countless controversial stories that cause the data scientists and the privacy advocates to spar, and when we exclude the extremists we find the moderates debating an underlying question.
How does a company balance the needs of its bottom line through data mining against the desires of the consumer for relevant products and the consumer's need for privacy?
One problem with this underlying question is that it is layered; it forces the company to balance the need of the company against both the consumer's need for privacy and the consumer's desire for relevant products. Unfortunately, many consumers are unable to articulate what exactly their need for privacy is, or how important it is in comparison to their desire for relevant products.
One easily accessible example of this level of balancing done well is the streaming service Netflix. In recent years, Netflix has evolved from merely aggregating movies and television shows to producing an increasingly large number of them, with remarkable success. In fact, the success rate of Netflix originals is roughly 72 percent; twice the success rate of traditionally launched TV shows. The main factor to Netflix's success? Informed decision making.
Netflix has a robust architecture that enables its executives to perform exhaustive data collection on the millions of users who use the service every night. In this fantastic primer, Zack Bulygo of Kiss Metrics outlines some of the minutia of Netflix's data collection efforts, annotating some of the things Netflix is able to collect as you use its app:
- When you pause, rewind, or fast forward
- What day you watch content (Netflix has found people watch TV shows during the week and movies during the weekend.)
- The date you watch
- What time you watch content
- Where you watch (zip code)
- What device you use to watch (Do you like to use your tablet for TV shows and your Roku for movies? Do people access the Just for Kids feature more on their iPads, etc.?)
- When you pause and leave content (and if you ever come back)
- The ratings given (about 4 million per day)
- Searches (about 3 million per day)
- Browsing and scrolling behavior
This information armed Netflix with the knowledge to not only create "House of Cards" but also to be reasonably assured of its success, and to know which actor - in this case Kevin Spacey - and director - David Fincher - to select for the project. They were able to make these decisions because their data mining efforts provided them with the following information:
- A lot of users watched the David Fincher directed movie "The Social Network" from beginning to end.
- The British version of “House of Cards” has been well watched.
- Those who watched the British version “House of Cards” also watched Kevin Spacey films and/or films directed by David Fincher.
To the, apparently millions, of people who love: Kevin Spacey, the British "House of Cards" and David Fincher, the creation of Netflix's "House of Cards" is a highly relevant product. This product was so relevant and such a run-away success that it indisputably boosted Netflix's bottom line by adding more than two million subscribers to the service. However, the relevancy of the product is only one part the equation that Netflix must balance: What about the privacy concerns of Netflix's collection and use of this data?
Is that data private on its own, or when is it only 'private' when it's combined with some other information? Does Joe even know that Netflix is using this information, and if he does, does he care? What recourse, if any, does Joe have for limiting what Netflix can do with that information? He may not care so much that Netflix is using his browsing habits to give put Kevin Spacey in the Oval Office, but what if Netflix sold that information to his employer, who was able to determine that Joe isn't as productive in the workplace as he claims?
Unfortunately, while these break through technologies are preventing food shortages in Philadelphia, creating insanely popular TV shows like House of Cards and Luke Cage, the underlying mechanics are still no different than that of Target's outing of teen pregnancy. As a result, the question of privacy must be refined as society and technology advance, or the political, legal, and ethical frameworks it helped create will no longer provide much protection.
Most of this evolution requires, in my opinion, the increased education of the consumer. If Joe is unaware not only what Netflix knows, but how Netflix can use it to infer other information that Joe didn't provide them, then Joe is unable to articulate his privacy needs to Netflix nor his elected officials. This becomes difficult when privacy means different things to different people, Netflix will increasingly rely on socially acceptable minimums. These minimums (i.e. The Privacy Act of 1974, HIPPA) can only exist if citizens like Joe are able to articulate their privacy concerns in the face of emerging technologies.
At the center of this consumer education must be an increased understanding of the value that data has. While companies are becoming increasingly fluent in the interchangeability of data and currency through patent cases like Ultramercial (whose patent of "Advertising as Currency" was recently upheld), there has not yet been an increased awareness among the consumer. This unawareness of the consumer that people like Joe are paying Netflix both in money and in data leaves them at a disadvantage in negotiating directly with companies like Netflix, and in electing officials who will address their privacy concerns.
As companies like Uber and Netflix become increasingly prominent, the realization that this data is no longer incidental, but critical to these companies, will become more apparent. Eventually, consumers will realize that the price for his "House of Cards" addiction is much higher than $7.99 a month, but will we have the political, legal, and ethical framework with which to address this concern?
Currently, that answer is a resounding no. The data Netflix collects on us is typically not ours once it enters their system; signed away a License Agreement we never read, our Netflix habits are no more ours than our Facebook Gender is. While the ownership of data regarding how we use technologies like Uber, Netflix, or Facebook may not be a cause for alarm, technology isn't slowing down, and as it advances, so too does the debate.
Consider the autonomous cars that Google, Tesla, Hyundai, and NVIDIA are all building; the privacy implications of these cars are not insignificant. In order to function, these cars have to collect historic and real-time data on where you are, where you're going, and how many people are with you. In his essay, Privacy and Security in the Age of the Driverless Car, Hanley Chew says this about who owns that data:
He goes on to write, "the auto industry remains self-regulating in determining data collection, ownership, retention, and usage policies relating to self-driving cars." This heartbreaking realization is made worse when we realize the far reaching effects that this self-regulation could have. If this information were sold to insurers, they would undoubtedly use it to influence your insurance rates based on frequent consumption of fast food. Taken individually, that effect is sobering, but when combined with societal data on the socioeconomic factors that drive impoverished people to rely on fast food, the theoretical abuses and feedback loops are limitless.
To add insult to injury, these companies would be profiting off of data they could only have collected if you purchased their $20,000+ car! If you had never purchased their product, then they would have not had the means in which to collect and monetize this data, potentially to the detriment of you and society as a whole. Realistically, these companies would not be able to operate with so little oversight, even in a mostly self-regulatory technology; and arguably, even if they were, we can't assume that self-regulation will automatically lead to rampant abuse of these technologies.
However, just as countless technological innovations were made possible throughout human history by capitalizing on previously wasted byproducts, data must one day cease to be treated as happenstance, and be understood for the value it possesses. It's not enough for the government to protect the physical safety of its citizens through its privacy laws, it must enable its citizens to be educated and capable enough to fight for their economic security in light of a booming industry. It's only in doing so that consumers will be able to understand the true cost of their Tesla, House of Cards, or Target membership.