Big Data

Privacy As Currency

Privacy As Currency

Arguments for and against the use of "Big Data" to tailor services and advertisements litter the blogosphere, but one thing is certain: Without this data, many of the tools society depends on would be inconceivable. However, these revolutionary tools aren't without consequences.  In one prolific example, captured by Charles Duhigg in his book The Power of Habit, the national retailer Target predicts the pregnancy of, and sends relevant advertisements to, a teenage girl at such an early stage of her pregnancy that her family, friends, and boyfriend had not yet been informed of the new development. The situation caused such an uproar among privacy advocates and those against general 'creepiness' of the situation, that Target artificially diluted the accuracy of its algorithms in order to prevent alienating future customers. 

While companies like Target grapple with the nuances of using this data, break through technologies have emerged that enable us to turn our unused rooms into mini-hostels, prevent food shortages in Philadelphia, and create insanely popular TV shows like Luke Cage. Unfortunately, these technologies face the same privacy concerns that Target once grappled with, and the privacy debate continues to evolve.  This evolution must continuously be refined as society and technology advance, or the political, legal, and ethical frameworks it helped create will no longer provide much protection. Unfortunately, while this debate has evolved around the safety of consumers and the protection of data, there has been little discussion about the economic security of consumers and their data.

Just as countless technological innovations were made possible throughout human history by capitalizing on previously wasted byproducts, data must one day cease to be treated as happenstance, and be understood for the value it possesses. It's not enough for the government to protect the only physical safety of its citizens, it must enable its citizens to be educated and capable enough to fight for their economic security in light of a booming industry. It's only in doing so that consumers will be able to understand the true cost of their consumerism.


Big Data and Privacy

Earlier this week, the President's Council of Advisors on Science and Technology (PCAST) released a seventy two page report on the intersection of Big Data and Privacy with an unoriginal title of:  Big Data And Privacy: A Technological Perspective.  It started by first establishing the groundwork for the traditional definition of privacy, as defined by Samuel Warren and Louis Brandeis in 1890.  These individuals stipulated that privacy infractions can occur in one of four ways:

  1. Intrusion upon seclusion.  If a person intentionally intrudes upon the solitude of another person (or their affairs), and the intrusion is seen as "highly offensive" then an invasion of privacy has occurred.
  2. Public disclosure of private facts.  If a person publishes private facts, even if true, about someone's life - an invasion of privacy has occurred.
  3. Defamation, or the publication of untrue facts, is an invasion of privacy.
  4. Removing personal control of an individual's name and/or likeness for commercial gain is an invasion of privacy.

These infractions basically come down to a removal of the control that an individual has over various aspects of their life (being left alone, selective disclosure, and reputation), and PCAST tends to agree as they state a couple of times throughout their report about the need for selective sharing and anonymity.  The report went on to address a few philosophical changes in our mindset about privacy that were needed in order to better enable the successful implementation of the five aforementioned recommendations:


  • We must first acknowledge that private communication interception is easier
  • We need to extend "Home as one's castle" to become "The Castle in the Clouds"
  • Inferred Private facts are just as stolen as real data
  • The misuse of data and loss of selective anonymity is the key issue.


The report goes on to state that the majority of the concern is with the harm done by the use of personal data and that the historic way of preventing misuse of personal data has been in controlling access; a measure that is no longer made possible in today's nebulous world of data ownership.

Personal data may never be, or have been, within one's possession.

From public cameras and sensors to other people using social media, we simply have no control over who collects data from whom; and we likely never will again.  Which raises the question of who owns the data and who controls it.

And while the Electronic Frontier Foundation would complain (again) that this failed to address metadata (in spite of it equating metadata to actual data in the first few pages), this report comes on the eve of a unanimous vote in the House to rein in the National Security Agency making this a big week for big data privacy advocates.

What is Business Intelligence?

Imagine that you have a small, but rapidly expanding, business that finds itself with multiple ways of storing data.  You start with the best of intentions to have one central database for all of your resources, but you've increasingly found yourself with more software suites requiring very different database management systems; and that's a problem.

It's a problem because while this information is loosely related, there is nothing you can do to link the data together.  After all, SQL 2000 does not talk with MySQL 3.23; so how do we analyze the the information that's contained in so many different types of databases?  We first enable data relationships through a process known as ETL or Extract, Transform, Load.

This brief video explains how a company might find itself in this situation and how ETL can assist it in combining data into a central repository known as a data warehouse. This data warehouse is a snapshot of several databases (like those listed in the video) in one central repository so that analysts can turn the data into actionable information.  This process of turning data into actionable information is known as Business Intelligence.

Business Intelligence involves any action required to take a business process (like "enroll a student") to analysis (such as "how many lower income students enrolled in 2014?") to action ("Improve enrollment rates of lower income students").  These steps vary depending on the size and scope of operations, but can typically be reduced into a simple data process which has been succinctly defined by Google:

  • Prepare
  • Analyze
  • Apply

While not every company will require an ETL process, a data warehouse, or an OLAP cube, every company must prepare their data before it can be analyzed.  Similarly, analysis must take place before knowledge can be accurately applied; and every company will have different method of analysis.  The single commonality is that Business Intelligence requires preperation, analysis, and application in order to turn data into profit.

Want to know more?  Check out my book Understanding IT in January 2015 or subscribe to this RSS feed for more updates and teasers during the writing process.  Alternatively, if you have an immediate project that you need help with, please check out my consultation services below.