Friday, September 14, 2012

Team A (Factor Analysis) - Ankit Jaiswal (Mktg)


Factor analysis is a statistical method used to describe variability among observed, correlated variables in terms of a potentially lower number of unobserved variables called factors. This analysis is based on dependent variables and independent variables.

For example, Sales of any product is generally dependent on factors like customer satisfaction, marketing speed, product, etc. So in this case, factors like customers satisfaction, marketing speed, product are independent variables, and sales, on the other hand, is dependent variable.

In factor analysis, we find whether the independent variables are related to each other in some or the way. We do not drop the variables, but instead we combine them and try to remove the correlation between them. The independent variables are combined together on the basis of the correlation, and the ones which are more correlated are combined and a cluster is formed. The information gained about the interdependencies between observed variables can be used later to reduce the set of variables in a dataset.

Reasons to perform Factor Analysis
  • To reduce the variables and remove the correlation between them, so that we get a better picture of the scenario.
  • To see and find the common underlying theme and label them.

An ideal scenario to perform factor analysis is when there is large number of variables and all of them are correlated with each other. In this analysis, only scale numbers are considered.
To conduct a Factor Analysis, start from the “Analyze” menu.  This procedure is intended to reduce the complexity in a set of data, so we choose “Data Reduction” from the menu.  And the choice in this category is “Factor,” for factor analysis.



After we click on factor, the following dialog box appears


From this dialogue box, string variables like manufacturer name, model name, etc are not selected, only the scale numbers are selected.
From the Descriptive, we check the Initial solution option.
From the Extraction, check Scree plot, and check eigenvalue to be greater than 1.
From the Rotation, check Varimax option.

Now after the output file we get, we see the Communalities output, which is as under:


In the above table, the Initial column shows that the variance for the different variables is 1, and the Extraction column shows that the amount of extraction that is possible from that particular variable. The thumb rule is that if the extraction value for any variable is below 0.5, then we drop it. In the above box, sales have let extraction, i.e. 0.403. So if we remove the sales as one of the variable, we see that the extraction rises to 80%-90%, as seen under


After this, we copy the data of one variable, suppose price to excel sheet. And find the average of the entire data of the price of the cars, and find out the variance from the average. We also find out the standard deviation of the data, which helps us to find the Z score (Variation/Std Dev). The properties of the Z score is that it retains the distribution of the data, and, their mean = 0, and Std. Dev = 1.
Components are made of some amount of variance of all types. In the table ‘Total Variance’, Total resembles Variance.


The above table shows that the first two components are not correlated and account for almost 82% out of the total 100%. Rest 18% is covered by the remaining 8 components. Varimax method makes the variance of the different components appear in the descending order with variables having more % t the top. The other columns in the table shows that we move forward with the components 1 and 2 as they are not correlated.
Then we come to the Component matrix, it shows the relation of the component with the variable.


Scree plot helps in choosing the number of components out of the total number of variables. In this case, we choose component 1 & 2 as after that its almost constant, as can be seen under:


Rotation Component Matrix
This is the critical element of the Factor analysis. Rotation tries to equalise the variance, making sure that the cumulative variance remains the same.
We do the rotation so that the dominant variable can be identified & also find out what it is made of. The rule is whichever variable has higher value in component 1 and lesser value in component 2 is selected. The difference between the two should be at least 0.30. the rotation matrix looks as under


In the above figure we can see that, in component 1, variables like wheelbase, width, length have greater values and all of these shows us the specifications of the car, so we label it as SPECS. In component 2, variables like 4-year resale value, price, horsepower have greater values so we label them as PRICE as all specifies the price except horsepower. This shows that there is a possibility of 3rd component. 

No comments:

Post a Comment