Business Analytics Workshop SIBM 2011 Marketing : Day 8 - Team I

Factor Analysis as a Classification Method using Example

Let us now return to the interpretation of the standard results from a factor analysis. We will henceforth use the term factor analysis generically to encompass both principal components and principal factors analysis. Let us assume that we are at the point in our analysis where we basically know how many factors to extract. We may now want to know the meaning of the factors, that is, whether and how we can interpret them in a meaningful manner. To illustrate how this can be accomplished, let us work "backwards," that is, begin with a meaningful structure and then see how it is reflected in the results of a factor analysis. Let us return to our satisfaction example; shown below is the correlation matrix for items pertaining to satisfaction at work and items pertaining to satisfaction at home.

STATISTICA FACTOR ANALYSIS	Correlations (factor.sta) Casewise deletion of MD n=100
Variable	WORK_1	WORK_2	WORK_3	HOME_1	HOME_2	HOME_3
WORK_1 WORK_2 WORK_3 HOME_1 HOME_2 HOME_3	1.00 .65 .65 .14 .15 .14	.65 1.00 .73 .14 .18 .24	.65 .73 1.00 .16 .24 .25	.14 .14 .16 1.00 .66 .59	.15 .18 .24 .66 1.00 .73	.14 .24 .25 .59 .73 1.00

The work satisfaction items are highly correlated amongst themselves, and the home satisfaction items are highly intercorrelated amongst themselves. The correlations across these two types of items (work satisfaction items with home satisfaction items) is comparatively small. It thus seems that there are two relatively independent factors reflected in the correlation matrix, one related to satisfaction at work, the other related to satisfaction at home.

Factor Loadings. Let us now perform a principal components analysis and look at the two-factor solution. Specifically, let us look at the correlations between the variables and the two factors (or "new" variables), as they are extracted by default; these correlations are also called factor loadings.

STATISTICA FACTOR ANALYSIS	Factor Loadings (Unrotated) Principal components
Variable	Factor 1	Factor 2
WORK_1 WORK_2 WORK_3 HOME_1 HOME_2 HOME_3	.654384 .715256 .741688 .634120 .706267 .707446	.564143 .541444 .508212 -.563123 -.572658 -.525602
Expl.Var Prp.Totl	2.891313 .481885	1.791000 .298500

Apparently, the first factor is generally more highly correlated with the variables than the second factor. This is to be expected because, as previously described, these factors are extracted successively and will account for less and less variance overall.

Rotating the Factor Structure. We could plot the factor loadings shown above in a scatterplot. In that plot, each variable is represented as a point. In this plot we could rotate the axes in any direction without changing the relative locations of the points to each other; however, the actual coordinates of the points, that is, the factor loadings would of course change. In this example, if you produce the plot it will be evident that if we were to rotate the axes by about 45 degrees we might attain a clear pattern of loadings identifying the work satisfaction items and the home satisfaction items.
Rotational strategies. There are various rotational strategies that have been proposed. The goal of all of these strategies is to obtain a clear pattern of loadings, that is, factors that are somehow clearly marked by high loadings for some variables and low loadings for others. This general pattern is also sometimes referred to as simple structure (a more formalized definition can be found in most standard textbooks). Typical rotational strategies are varimax, quartimax, and equamax.
We have described the idea of the varimax rotation before and it can be applied to this problem as well. As before, we want to find a rotation that maximizes the variance on the new axes; put another way, we want to obtain a pattern of loadings on each factor that is as diverse as possible, lending itself to easier interpretation. Below is the table of rotated factor loadings.

STATISTICA FACTOR ANALYSIS	Factor Loadings (Varimax normalized) Extraction: Principal components
Variable	Factor 1	Factor 2
WORK_1 WORK_2 WORK_3 HOME_1 HOME_2 HOME_3	.862443 .890267 .886055 .062145 .107230 .140876	.051643 .110351 .152603 .845786 .902913 .869995
Expl.Var Prp.Totl	2.356684 .392781	2.325629 .387605

Interpreting the Factor Structure. Now the pattern is much clearer. As expected, the first factor is marked by high loadings on the work satisfaction items, the second factor is marked by high loadings on the home satisfaction items. We would thus conclude that satisfaction, as measured by our questionnaire, is composed of those two aspects; hence we have arrived at a classification of the variables.
Consider another example, this time with four additional Hobby/Misc variables added to our earlier example.
In the plot of factor loadings above, 10 variables were reduced to three specific factors, a work factor, a home factor and a hobby/misc. factor. Note that factor loadings for each factor are spread out over the values of the other two factors but are high for its own values. For example, the factor loadings for the hobby/misc variables (in green) have both high and low "work" and "home" values, but all four of these variables have high factor loadings on the "hobby/misc" factor.
Oblique Factors. Some authors (e.g., Cattell & Khanna; Harman, 1976; Jennrich & Sampson, 1966; Clarkson & Jennrich, 1988) have discussed in some detail the concept of oblique (non-orthogonal) factors, in order to achieve more interpretable simple structure. Specifically, computational strategies have been developed to rotate factors so as to best represent "clusters" of variables, without the constraint of orthogonality of factors. However, the oblique factors produced by such rotations are often not easily interpreted. To return to the example discussed above, suppose we would have included in the satisfaction questionnaire above four items that measured other, "miscellaneous" types of satisfaction. Let us assume that people's responses to those items were affected about equally by their satisfaction at home (Factor 1) and at work (Factor 2). An oblique rotation will likely produce two correlated factors with less-than- obvious meaning, that is, with many cross-loadings.
Hierarchical Factor Analysis. Instead of computing loadings for often difficult to interpret oblique factors, you can use a strategy first proposed by Thompson (1951) and Schmid and Leiman (1957), which has been elaborated and popularized in the detailed discussions by Wherry (1959, 1975, 1984). In this strategy, you first identify clusters of items and rotate axes through those clusters; next the correlations between those (oblique) factors is computed, and that correlation matrix of oblique factors is further factor-analyzed to yield a set of orthogonal factors that divide the variability in the items into that due to shared or common variance (secondary factors), and unique variance due to the clusters of similar variables (items) in the analysis (primary factors). To return to the example above, such a hierarchical analysis might yield the following factor loadings:

STATISTICA FACTOR ANALYSIS	Secondary & Primary Factor Loadings
Factor	Second. 1	Primary 1	Primary 2
WORK_1 WORK_2 WORK_3 HOME_1 HOME_2 HOME_3 MISCEL_1 MISCEL_2 MISCEL_3 MISCEL_4	.483178 .570953 .565624 .535812 .615403 .586405 .780488 .734854 .776013 .714183	.649499 .687056 .656790 .117278 .079910 .065512 .466823 .464779 .439010 .455157	.187074 .140627 .115461 .630076 .668880 .626730 .280141 .238512 .303672 .228351

Careful examination of these loadings would lead to the following conclusions:

There is a general (secondary) satisfaction factor that likely affects all types of satisfaction measured by the 10 items;
There appear to be two primary unique areas of satisfaction that can best be described as satisfaction with work and satisfaction with home life.

Wherry (1984) discusses in great detail examples of such hierarchical analyses, and how meaningful and interpretable secondary factors can be derived.
Confirmatory Factor Analysis. Over the past 15 years, so-called confirmatory methods have become increasingly popular (e.g., see Jöreskog and Sörbom, 1979). In general, you can specify a priori, a pattern of factor loadings for a particular number of orthogonal or oblique factors, and then test whether the observed correlation matrix can be reproduced given these specifications. Confirmatory factor analyses can be performed via Structural Equation Modeling (SEPATH).

Business Analytics Workshop SIBM 2011 Marketing

Friday, September 14, 2012

Day 8 - Team I - Rajdeep

Factor Analysis as a Classification Method using Example

No comments:

Post a Comment