Permeable Data SourcesPosted: January 18, 2012
Open source data has been around for a while. A way to create a data set that was open and free for everyone to use. In fact, the New York Times even did a story about this approach today’s paper — (‘Open Science’ Challenges Journals — http://nyti.ms/yCyzBX). Open data archives allow scientists to easily use others’ data without having to explicitly ask permission. This open process is the radical extreme of “information should be free” politics. And it is hard to see how an organization can effectively monetize the effort they went through in collecting the data.
Closed data sets are much more the norm. Companies own the data. They control the data. They ration out (or sell access to the data). The data belongs to them and is under their control — even if it is data about someone else. Visa now controls a huge data set about me and what I buy. Powerful information they sell to the highest bidder. But the unfortunate reality is that these data sources lose value because of their isolation. Each data set is an island to itself. The proprietary nature of the data set prevents it from being shared or connected to other data. It becomes less valuable because it stands alone.
But permeable data sources are different (and if anyone can find a reference to this exact term anywhere, let me know — I want to trademark it). Permeable data is betwixt and between. It isn’t open. And it isn’t closed. It can be controlled, but not exclusively by the person who owns the repository. Instead, the data is controlled by those who contributed the data. In a permeable data source, you control the data about you. Think of it as a data locker. A company comes forward to provide you with a way to store information about yourself. You decide what to contribute to that data set. Then you are in control of who can access and use that data. In essence you can sell or barter information about you for some type of value — money or other types of value.
For me, Facebook through its OpenGraph technology is an example of a permeable data source. And it is an incredibly powerful one. When you design an OpenGraph application, you give access the app access to some or all of a menu of data about yourself. You are providing a connection to the information that you have contributed to Facebook through the routine interactions you have with it. Postings, tags, likes, whatever become a part of the dataset about you that an application can make use of. So this for me is a permeable data source. And in thinking about Facebook in this way, it becomes clear that valuable permeable data sources have some important things in common.
Collects Large Amount of Relevant Data.
The permeable data source needs to have a large set of individuals or entities providing information. Scale is important. And the data must be relevant to a variety of different uses. There is the biggest challenge for something like Facebook — we are contributing tons of information but how relevant is it? What can it be used for? Data in itself is worthless. Relevant data is priceless. The challenge here becomes developing ways to render the data relevant.
Easy to Gain Access to Data.
There has to be mechanisms that render the access to the data frictionless. We want the creative power to go into innovative ways to use the data. If there is significant overhead required to gain access to the data, then the quality of the use suffers. Facebook is fantastic at this. Apple through its iTunes store is not. A developer can easily create an app. It requires no explicit permission from Facebook. But Apple requires an app to be approved according to is criteria before it can be implemented.
Data Must Be Secure.
Aside from the threats to privacy, there is an economic imperative behind keeping the data secure. Hacking the system essentially bypasses the control of the data provider (in this case both the user and the repository). When the control is an essential enticement to using the data source, losing that control renders the data source much less valuable.
Connected to Other Data.
The permeable data source needs to be capable of being connected to other uses. This connectedness amplifies the value and application of the data set. Now through the OpenGraph, Facebook allows the connection of apps outside of Facebook with data inside it. I need more work on this. But it is an important idea none the less.
So I want to talk more about permeable data sources. How would they work for our companies?