Open Data: The end of corruption or the beginning of "Big Brother"?
Thought experiments are an often overlooked and underutilized
tool in the toolbox of science, policy, art, and basically any other field about which one can think. Einstein famously conducted thought experiments, which he credited with aiding some of his greatest achievements such as describing light moving as a photon or the Theory of Relativity. Schrodinger’s cat, Zeno’s paradox, and the prisoner’s dilemma are other types of well-known thought experiments. The power of the thought experiment is not in the conducting of the experiment but rather in the deliberate visualization of the consequences of the question itself.
I find that thought experiments are most impactful when confronting issues with a lot of mixed messages. For example, we’ve all had bosses, parents, sports coaches, or other people in our lives say something like “It’s important to take risks.” Often, these same people almost simultaneously say, “But don’t fail.” Obviously this is not useful and doesn’t allow for a “win” in either direction as failure often, although not always, comes hand in hand with risk-taking. If a person takes risks but fails, that’s a bad outcome. If that same person does not take risks but also does not fail, that too is a bad outcome as valuable growth opportunities are missed due to the mixed message. The tendency for most people is to be conservative. Thus, the only way forward is relying solely on external chance rather than internal skill and intuition.
The case is much the same in the current debate on the value of open data. Seemingly everyone has opinions on how “open” data should be. Perhaps this is because personal security is a topic that is important to everyone from corporations to individuals. In those opinions, however, two main camps of thought are emerging: ‘open data is important and the future’ and ‘open data is dangerous’. Let’s first take a look at the ‘important’ camp. The idea of the usefulness of open data, by and large, goes something like this: 1) we have long-standing issues around the world such as hunger, poverty, and food insecurity. 2) We have the tools to combat these issues, but it takes collaboration and sharing of data to provide new and innovative insights into how that data can be used. 3) Through that collaboration and sharing, coupled with enabling policies from government and supportive actions from the private sector, we can solve these issues. Implicit in those three steps is the idea that we can solve these issues with less capital, time, and resources than under the current, more protective paradigm around data. The theory has sound backing. Research does show that proper collaboration increases productivity. Whether a government works for or against the people, increasing the size of the pie will increase the size of each pie piece, even if some pieces of that pie should be larger than they are. This argument sounds compelling and reasonable. Without proper protection, however, data gained through collaboration can have a punitive effect, and unintended consequences. Take the lawsuit against AgriStats as an example. AgriStats is a company that collected data on poultry farms in order to provide comparative data about chicken production. The value that AgriStats provided to the clients was targeted areas for efficiency improvement so that operators could increase efficiency and maximize profit. The data was anonymised, and so the farmers thought their data was secure and would be willing to exchange their data for access to the service. Next, larger companies further up the supply chain then bought this anonymous-aggregate data. So far, nothing is out of the ordinary. However, they were allegedly able to reverse engineer the identifiable information, pressure farmers, and engage in what was ostensibly price fixing. Again, this is all alleged, but the importance of this case shows the care that must be taken about how to use open data responsibly.
So, at first glance, our thought experiment seems to be that sharing data causes an increase in productivity but will almost certainly lead to at least some level of unacceptable abuse that will have crippling side effects. Conversely, protecting data will drive down productivity, force more regulations, and lead to giant tech guardians of every aspect of our digital life. In the first case, the benefits of sharing are weighed against the potential for abuse. In the second case, the benefits of protection are weighed against the consequences of centralized power in the new data economy.
Two wrenches can also be thrown into this experiment. First, the sheer amount of data being created now is staggering. In fact, more data has been produced in the last two years than was produced in the entire history of humanity leading up to that point. New oil reserves were not found at this rate; meaning that opportunities are supposedly emerging faster than they can be fully realized in terms of their business and societal value. With the Internet of Things (basically all devices connected and talking to each other) growing exponentially, the amount of data will increase from now by a factor of 10 by 2020. Second, millennials are changing the data economy. A recent research project conducted by Mintel found that 60% of millennials are willing to share personal data compared to 30% for baby boomers. Of the millennials who said they would not provide data at all, 30% would change their mind if offered something as small as a $10 coupon by a company.
So how do these wrenches affect the thought experiment? In the first case, where open data is shared but unintended consequences increase, consequences are mitigated if people don’t actually view them as consequences. Understandably, this seems like a weak argument given that millennials still want a certain level of trust and stewardship of that data. However, if the value of that data is only a $10 coupon, just how much trust and stewardship are millennials really expecting? If data explodes 10-fold in the next two and a half years, is it really possible to analyze and interpret much meaning from that data if only 0.5% of all data is analyzed today? Collaboration can make better use of this data and provide societal, business, and research benefits.
In the second case where data is protected, it could become a barrier to entry for competition. Standard Oil was used earlier as an example of a monopoly. When previous anti-trust lawsuits were heard, a company’s size and market share versus that of its competitors were the main argument points. Size, especially today, is not an inherently bad thing when it comes to anti-trust lawsuits. In today’s economy, a company’s digital assets could be equally as important. Facebook recently bought tech startup Whatsapp for $22 billion, despite Whatsapp having little to no revenues to speak of. Did Facebook find some valuable market share it wanted to tap into, or was it an early warning of data hurting competition? Put another way, are these powerful, yet beneficial, tech companies going to remain beneficial or will they create new barriers to entry? I’m not sure, but to date the benefits far outweigh the consequences. With more and more big data analysts, deep learning techniques, and artificial intelligence, can we really expect anonymous data to remain anonymous no matter what regulations are put into place? Put another way, data will become open eventually, so we can either choose how it will happen or have it forced upon us.
Now our thought experiment is getting good as we’re painting a future of exploding data, a generation that is more open about privacy, and an economy whose barriers to entry depend on more than just physical size and market capital. All the while we have to consider the consequences of today. Either path has benefits and consequences that can be promoted and mitigated, respectively. From a moral standpoint, both arguments help and hurt people, so how can we choose what is best?
I wrote recently about the case for the philosopher, which you can find in my blog history. This is a good case for a philosopher, and a good philosopher to pattern our argument after is John Rawls. An American political and moral philosopher, he received the National Humanities Medal in 1999 from then-president Bill Clinton who remarked that Rawls’ work “helped a whole generation of learned Americans revive their faith in democracy itself”. So, this sounds like a good choice for a philosophical model from a philosopher who just so happened to use thought experiments like his famed “veil of ignorance”. In this thought experiment, Rawls impartially positions everyone as equals and encodes all of their intuitions, regardless of how seemingly relevant or irrelevant they are, in order to deliberate about justice. The idea behind this thought experiment is that if a person does not know anything about himself or herself (race, gender, religion, height, …etc.), it prohibits that person from arguing from a point of self interest and creates a just and moral society.
So let’s assume nothing about ourselves at the moment. We know everything about how data is used and collected now, and we know how the amount of data will increase in the future and what problems we need to solve. We don’t, however, know which role we are playing in this world. We could be the CEO of a major data company, or we can be a food-insecure person living without access to data as a valuable commodity. What would we want a data-rich society to look like? 1) We might want to make sure that we can get the most information as possible out of that data to ensure we can get the most societal benefit out of it as possible. 2) We might want to make sure that we can use that data to build products and businesses that can fairly compete with other businesses. 3) We would want to make sure there is some form of reciprocity. 4) We would also want to make sure that we are protected from unintended consequences to the best extent possible. 5) We want to make sure there is some form of recourse too if we are wronged.
It would seem that an open data future, by a slight margin, is the way to go. However, just like what makes up an anti-trust lawsuit may need changing in the data economy, so too does how we live and work. Assuming we are eventually headed toward an open data world, let’s consider three roadblocks that many struggle to overcome. First let’s discuss individual data protection. No one wants to be the victim of identity theft, just as no farmer wants to be the victim of alleged price fixing in the above example. Whether data is open or closed, data security firms will still work to anonymise and protect the most personal data, and other actors will try to uncover that data and, either purposefully or accidentally, release information that should not be released and cause harm. With that known, it is important that punishments deter people from these courses of action. Some evidence from the U.S. indicates that governments may be thinking along those lines as punishments for identity theft have been increased and can carry as much as 15 years in prison. That’s a good deterrent, but stealing money from a corporation, by comparison, is a class B felony and can carry up to 20 years in prison. Additionally, as we record our lives online more and more, we build a profile of activities. The drawback of a profile is perhaps only being exposed to some of the advertisements and products available, but the benefit is that identity theft could be caught and stopped much sooner. It’s hard to prove and recover from identity theft, but increased and open data could lead to algorithms that predict a problem, monitor the situation, and lead authorities to apprehending the criminal. If you think that’s far fetched, think about the sports world. Las Vegas sports gamblers, who reported suspicious performance of the program in 2005, initially caught the point shaving scandal of the Toledo college football program. How did they do this? They used lots of data and predictive algorithms. The more open the data, the better the algorithms can be to catch unwanted actions.
A second roadblock is in the research sector. Large granting agencies are already calling for more, and larger, collaboration on projects. They recognize the value of the “wisdom of crowds”. But these grants still require a primary set of investigators, and where there used to be room for a junior faculty investigator seeking tenure, there is now only room for tenured, well-known faculty members as primary investigators. Additionally, the ability to become a first or corresponding author on these projects is slim to none for a junior faculty member. This is important because tenure, especially at major research universities, is almost entirely driven by how much a faculty member publishes as a first or corresponding author. Being a second, third, or other lower order author is of zero value to these junior faculty members. So, let’s assume a junior faculty member has a small research budget from smaller grants cobbled together from a variety of sources, has done some incredibly novel work, and wants to publish that data. That sounds great until that researcher catches the eye of higher up faculty members who are encouraging collaboration with major international teams as part of a broad topic of major importance to society. It’s a great opportunity and a wonderful way to build a powerful network and increase collaborative efforts. Unfortunately the buy-in for this is access to the researcher’s data and relegation to a lower-order authorship. It’s a tough situation in the publish-or-perish world of academia, and most will chose to forego the partnership to ensure promotion and tenure first. Does that mean that open data is inherently bad, or does this mean that the process by which this work is evaluated is out of date? In a world where a system thinking rather than single discipline thinking is becoming the norm, how can we integrate this into the tenure process? If we can agree that collaboration and sharing of data and resources are the way forward, then we need to ensure our future academic leaders are not punished for adhering to this approach and driven out of academia entirely.
The last roadblock is that of seeing open data as a competitive advantage in business. Joint ventures, collaborations, and transactions between businesses require all sorts of non-disclosure agreements and data protections. This is to ensure that whatever individual competitive advantage (or intellectual property) a business believes it has is only given away knowingly in exchange for value immediately or greater potential value over time. I have given this topic a lot of thought over the years, and I do believe that protecting data is usually at the expense of innovation. I also find that companies are more likely to share data “because they can” rather than “because they should”. Small and medium size companies, however, realize they are not in a fair fight with large, established corporations, and many countries have become renters’ economies due to market imperfections. Even if the playing field is level, the court systems are not. A large corporate partner can sue a small or medium sized enterprise, and even if the large corporate partner is wrong (often knowingly so), the costs of tying the smaller business up in prolonged court battles causes duress in the smaller business. This often leads to selling under market value. Given the risk and work that founders put into businesses, it seems sensible to protect data and be very careful about partnerships. That said, with the new, more socially responsible consumer (again armed with more data than ever) and the mitigating ways a business owner can raise money to fund work (think crowd funding and other access points), the marketplace seems to be catching up to these concerns. The world is a more open marketplace, and distribution channels are more available everywhere. Furthermore, not a lot of great evidence of data misuse, price fixing, collusion, and other fears that consumers have of big business seems to be occurring with the largest businesses in the tech space. By and large, the big 5 tech companies (Alphabet, Amazon, Apple, Facebook, and Microsoft) have made our lives significantly better. Beyond just business, places like Alphabet and Amazon have cutting edge ventures that push the boundaries of innovation, Apple continues to revolutionize personal technology, Facebook’s founder recently advocated for a future universal livable salary (perhaps a blog topic for another day), and former Microsoft founder Bill Gates is using his fortune to solve the world’s greatest humanitarian crises. In fact, did you know that Bill and Melinda Gates are the most effective philanthropists on a large scale in history? I know Rockefeller made more money (adjusted for inflation) and has given more money over time, but in terms of return on the dollars invested, Bill and Melinda Gates are the best. And what does the Gates Foundation want? Collaboration, sharing, and openness.
Increasing productivity can be good for both society and business, raising living standards and increasing profits. The question shouldn’t be “should we be more open with data or more protecting of data” but rather should be “how do we subtly alter the way we live and work such that we set ourselves up, both individually and collectively, for success?”