Factor Analysis

Factor analysis is defined by dictionary.com as “the use of one of several (statistical) methods for reducing a set of variables to a lesser number of new variables, each of which is a function of one or more of the original variables.” You may be thinking, ‘What the heck does that mean and how does it relate to our uses of big data?’ I’m no math whiz, much less a statistical genius, but I can understand the theories here and what they have to do with us and Really Gets Me. Factor analysis is a mathematical process used to glean pertinent information from a series of factors and data. It reduces a large number of variables down to a small number that are useful for the purposes of the analyst, and is extremely helpful, if used correctly, in finding patterns. Patterns are exceedingly important to us as proponents of big databases, as we can find out what a person does on a daily basis, is interested in, where they like to shop; the list goes on and on. Factor analysis takes a set of observed and related variables and derives those variables into a smaller number of unobserved and potentially unrelated factors, then shows the variability among these factors. This smaller number of factors is what we’re interested in because it can shows us things about a group of people or data that we could not see before.

Big data sets are just that: LOTS of data. There has to be a way to limit these data sets so that an analyst is not forced to sift through it all to find correlations and variations in it, so factor analysis is quite helpful in this business of big data. If you’re still confused about factor analysis and what it is exactly, here’s an example from Johns Hopkins University:

We have a concept of what “frailty” is, but we can’t measure it directly. We think it combines strength, weight, speed, agility, balance, and perhaps other “factors”, so we break it down into other independent factors such as grip strength, arm circumference, BMI, and speed of walk.

So in this example, all of the variables (BMI, grip strength, etc.) are combined into one factor called ‘frailty’ that we can use to describe someone and we can determine their level of frailty using all of the other variables. Another example:

Variables of how fast and long a person can run, how high they can jump, their batting average, how many balls they can catch, how much weight they can lift, can all be combined to describe a person’s general athletic ability.

Practical example from a prohealth.com study:

In an exploratory factor analysis of two different samples (number = 128 and number = 170), cognitive symptoms of fibromyalgia loaded on 2 dimensions: cognition and mental clarity (ß THE UNOBSERVED FACTORS!). The mental clarity factor comprised 8 items with factor loadings greater than .60 and was named the Mental Clutter Scale. These factors are: ‘spaciness’, looking at life through a haze, confusion, cluttered thinking, fogginess, rushing thoughts, fuzzy headedness, and information overload (ß THE OBSERVED VARIABLES!).

Factor analysis is fairly easy to figure out once all the data has been collected. Whether the data is from the Internet (most likely in our case), from a survey given to a large number of people, or from another quantitative method, it can be input into a statistics program such as SPSS, and the factor analysis procedure can be run to reduce the number of variables into something meaningful. The factor analysis feature in the statistical program will isolate the underlying, and thus unobserved, factors that best explain the data. Again, this is very helpful for trying to make sense of big data sets because it is difficult to sift through data about every aspect of a person’s life and pick out what is most important for a particular client. With factor analysis, we don’t have to do that. We could look at the twitters of a number of potential customers for a product, analyze their tweets and who they are following, and determine if, for example, travelling is something they are interested in.

The Wall Street Journal and a company called IndexUniverse recently completed a study that used factor analysis to determine how much investors are overpaying per year for actively managed mutual funds in the US stock market. The study narrowed all of the variables involved down to three factors- beta, size, and style of the funds- to find ways to be more effective with the investments.

Using factor analysis, Really Gets Me can find new ways to help our potential clients be more effective with their advertising, funds, and data to maximize profits and be revolutionary in their ways of thinking. Factor analysis and big data has already changed the game, and isn’t slowing down anytime soon so we would do best to embrace it.