Friday, September 14, 2012

Day 8 Team D: Factor Analysis


Definition: Factor analysis is a method of data reduction.  It does this by seeking underlying unobservant (latent) variables that are reflected in the observed variables (manifest variables).  There are many different methods that can be used to conduct a factor analysis (such as principal axis factor, maximum likelihood, generalized least squares, unweighted least squares), There are also many different types of rotations that can be done after the initial extraction of factors, including orthogonal rotations, such as varimax and equimax, which impose the restriction that the factors cannot be correlated, and oblique rotations, such as promax, which allow the factors to be correlated with one another.  You also need to determine the number of factors that you want to extract.  Given the number of factor analytic techniques and options, it is not surprising that different analysts could reach very different results analyzing the same data set.  However, all analysts are looking for simple structure.  Simple structure is pattern of results such that each variable loads highly onto one and only one factor. 

Factor analysis is a technique that requires a large sample size.  Factor analysis is based on the correlation matrix of the variables involved, and correlations usually need a large sample size before they stabilize.  Advise regarding sample size: 50 cases is very poor, 100 is poor, 200 is fair, 300 is good, 500 is very good, and 1000 or more is excellent.  As a rule of thumb, a bare minimum of 10 observations per variable is necessary to avoid computational difficulties.

Explanation: Now to do the factor analysis successfully, first we have to bring all the variables on the same measurement level. we can do it through a simple normalization technique. We will use Z-Score normalization as our normalization technique.

We can explain the whole process through a simple example.
We have valid data of car sales. First we will import the data in SPSS.

Now we will go into the data view:

We will copy the price column and paste it into an excel sheet:

Now with the above values, we will find the Z-Score value for normalization.

Now in F column, we have calculated the average and the standard deviation of the price.
In B column, we have calculated the variance which is (A1-$F$1).
In C column, we have calculated the Z value (B1/$F$3).

So, the mean of Z is AVERAGE(C:C) and the standard deviation of Z is STDEV(C:C).

Now we plot the graph of the price column and the Z value. the comparison of the graphs are shown below:


Now, we will take all the highly co-related variables for the factor analysis.
After we put it in SPSS, we get the communalities table with the extraction variables.
We will take the variables which have the extraction values of more than .5. So we remove the first variable.


No comments:

Post a Comment