The Conceptual Regression Equation

Honestly, I am not sure if this will work but I want to try it. I want us to think about creating a conceptual Regression Equation for each of our possible client companies. The assignment for Thursday’s class is at the bottom of htis post. But first let me quickly review how we talked about multiple regression in class.
Multiple Regression Review.
Remember the purpose of a regression equation is to determine how independent variables contribute to a dependent variable. Generally they take the form of
Y = a = B1X1 + B2X2 + B3X3 + ….
Y = dependent variable — the thing we want to predict
a = the constant (throw this out — we don’t need it)
X = the independent variable — the things we use to predict the dependent
B = the factor loading — basically the correlation coefficient — it can vary between -1 and +1. It tells us how much it contributes to the dependent variable.
Remember our extended example in class. We wanted to predict the amount that someone tips a waitperson. Y = amount of tip. We thought about all the possible factors that contributed to it — the independent variables. Quality of service. Quality of food. Total amount of the bill. Those are the Xs.
So now we understand the basics of regression.
A Conceptual Multiple Regression Equation.
Focus on the main purpose of the multiple regression equation as prediction. We want to use somethings to predict something. And if you really want to get manipulative, you can see a multiple regression equation as a recipe. We can manipulate the ingredients (independent variables) to determine the final product (the dependent variable).
So now I want to take the most abstract approach to multiple regression equations and see if we can use it to help determine what we want to do with our companies.
You have been thinking about and researching your companies. Now we need to determine if we can actually use the this approach to help our client companies deliver value to their constituencies. Wow, that sounded very vague, abstract, and downright hard. Here is what I mean.
For a company, what is it that they want to accomplish. For instance, let me try an example. What does GoogleTV want to do?
Here are a couple of links about Google TV.
Ed Cotton: Is There A TV Revolution Happening? @PSFK
What’s Really Next for Apple in Television –
In the simplest terms, I believe the true value of GoogleTV is going to be predicting what TV program you want to watch. So in the most simple conceptual terms, think of the independent variable (Y) as program recommendation (at the end of this extended example I will tell you about the problem with this example but just swing with it for a minute). What can it use to make the best possible prediction? In other words what are the the independent variables.
X1 = previous viewing. I think that would have a high loading.
X2 = time of day — I like to watch certain things at certain times of day.
X3 = my mood. Do I need the TV to lift me up or calm me down.
X4 = my other interests (am I a sports fan?)
So I could put together a conceptual algorithm that allows a GoogleTV (or AppleTV or connectedTV) to make accurate suggestions. And then, of course, I would add a feedback variable to the equation to measure its value — did I actually choose to watch it and how much did I watch.
So that would be my conceptual algorithm for predicting what TV programs to suggest. [And on a technical note this isn’t a good example because the TV Program isn’t a variable at all — it is nominal data — categories — so you can’t calculate correlation. But hang with it for just a few more minutes.]
Is this a valuable algorithm? I would say yes. Helping people get the program they want to watch is good for them — no duh. And I would argue that it is good for the program producer — in my mind they are still implementing some form of advertising so getting them a program they like will encourage them to watch more.
So here is the assignment. Can you come up with a conceptual multiple regression equation — heck, let’s just call it an algorithm — for your company. Is there something they want to predict? Is there something they want to show them? Is there something they want them to do? And are there things that might help motivate them to do that? Are their ingredients that we can draw from social media to get the user there?
OK, this is an example of one of those digressions that you think sounds brilliant on the way in and by the time you are at the end you wonder if it makes any sense. So tell me on Thursday if this works. And if we can’t at least conceptualize the dependent variables for the companies, then I believe that ReallyGetsMe won’t work for them.
We have to leave class on Thursday with our companies in hand.
Good luck.

Permeable Data Sources: Better Health Via Facebook OpenGraph

I have to do a presentation at a conference in about a month. It is the Digital Health Conference Extravaganza ( I just revamped the abstract for my talk. Here it is below. Let’s think about talking about this in class.

Permeable Data Sources: Better Health Via Facebook OpenGraph
Scott A. Shamp, Ph.D.
New Media Institute
University of Georgia

What you do impacts how healthy you are – and vice versa. Understanding an individual’s behavior can suggest ways to be and stay healthier. The intricate interplay between behavior and health blesses and bedevils us. If we understand peoples’ behaviors, we can make salient suggestions. But how can we learn what people are doing? Monitoring is difficult and expensive. Self-reporting is impoverished and often lacks validity. We need a way to measure behavior that provides voluminous data that is authentic. We also need an easy and secure method to use this data.

The explosive growth of social media presents exciting opportunities for health promotion. And new developments in this area offer novel mechanisms for helping people make better health behavior decisions. In April of 2010, Facebook announced the OpenGraph protocol. OpenGraph allows the development of Facebook programs called “apps” which can, with permission of the user, analyze and react to users’ Facebook data. Through OpenGraph, Facebook is now a “permeable data source” for individuals’ behaviors. Permeable data sources allow individuals to contribute data into a centralized repository and then determine who can access that data and for what purposes. Now the ability to easily write Facebook OpenGraph apps provides unprecedented opportunity to use actual behaviors in formulating healthy suggestions and promoting healthy decisions.

Can we develop systems that access the unique information found in these social media permeable data sources? And can we do it in a secure and responsible manner?

Permeable Data Sources

Open source data has been around for a while. A way to create a data set that was open and free for everyone to use. In fact, the New York Times even did a story about this approach today’s paper — (‘Open Science’ Challenges Journals — Open data archives allow scientists to easily use others’ data without having to explicitly ask permission. This open process is the radical extreme of “information should be free” politics. And it is hard to see how an organization can effectively monetize the effort they went through in collecting the data.
Closed data sets are much more the norm. Companies own the data. They control the data. They ration out (or sell access to the data). The data belongs to them and is under their control — even if it is data about someone else. Visa now controls a huge data set about me and what I buy. Powerful information they sell to the highest bidder. But the unfortunate reality is that these data sources lose value because of their isolation. Each data set is an island to itself. The proprietary nature of the data set prevents it from being shared or connected to other data. It becomes less valuable because it stands alone.
But permeable data sources are different (and if anyone can find a reference to this exact term anywhere, let me know — I want to trademark it). Permeable data is betwixt and between. It isn’t open. And it isn’t closed. It can be controlled, but not exclusively by the person who owns the repository. Instead, the data is controlled by those who contributed the data. In a permeable data source, you control the data about you. Think of it as a data locker. A company comes forward to provide you with a way to store information about yourself. You decide what to contribute to that data set. Then you are in control of who can access and use that data. In essence you can sell or barter information about you for some type of value — money or other types of value.
For me, Facebook through its OpenGraph technology is an example of a permeable data source. And it is an incredibly powerful one. When you design an OpenGraph application, you give access the app access to some or all of a menu of data about yourself. You are providing a connection to the information that you have contributed to Facebook through the routine interactions you have with it. Postings, tags, likes, whatever become a part of the dataset about you that an application can make use of. So this for me is a permeable data source. And in thinking about Facebook in this way, it becomes clear that valuable permeable data sources have some important things in common.
Collects Large Amount of Relevant Data.
The permeable data source needs to have a large set of individuals or entities providing information. Scale is important. And the data must be relevant to a variety of different uses. There is the biggest challenge for something like Facebook — we are contributing tons of information but how relevant is it? What can it be used for? Data in itself is worthless. Relevant data is priceless. The challenge here becomes developing ways to render the data relevant.
Easy to Gain Access to Data.
There has to be mechanisms that render the access to the data frictionless. We want the creative power to go into innovative ways to use the data. If there is significant overhead required to gain access to the data, then the quality of the use suffers. Facebook is fantastic at this. Apple through its iTunes store is not. A developer can easily create an app. It requires no explicit permission from Facebook. But Apple requires an app to be approved according to is criteria before it can be implemented.
Data Must Be Secure.
Aside from the threats to privacy, there is an economic imperative behind keeping the data secure. Hacking the system essentially bypasses the control of the data provider (in this case both the user and the repository). When the control is an essential enticement to using the data source, losing that control renders the data source much less valuable.
Connected to Other Data.

The permeable data source needs to be capable of being connected to other uses. This connectedness amplifies the value and application of the data set. Now through the OpenGraph, Facebook allows the connection of apps outside of Facebook with data inside it. I need more work on this. But it is an important idea none the less.
So I want to talk more about permeable data sources. How would they work for our companies?