Business Analytics Workshop SIBM 2011 Marketing : Day 9 :Team A

The purposes of discriminant analysis (DA)

Discriminant Function Analysis (DA) undertakes the same task as multiple linear regression by predicting an outcome. However, multiple linear regression is limited to cases where the dependent variable on the Y axis is an interval variable so that the combination of predictors will, through the regression equation, produce estimated mean population numerical Y values for given values of weighted combinations of X values. But many interesting variables are categorical, such as political party voting intention, migrant/non-migrant status, making a profit or not, holding a particular credit card, owning, renting or paying a mortgage for a house, employed/unemployed, satisfied versus dissatisfied employees, which custom- ers are likely to buy a product or not buy, what distinguishes Stellar Bean clients from Gloria Beans clients, whether a person is a credit risk or not, etc.

DA is used when:

• The dependent is categorical with the predictor IV’s at interval level such as age, income, attitudes, perceptions, and years of education, although dummy variables can be used as predictors as in multiple regression. Logistic regression IV’s can be of any level of measurement.

• There are more than two DV categories, unlike logistic regression, which is limited to a dichotomous dependent variable.

Assumptions of discriminant analysis

 The major underlying assumptions of DA are:

• The observations are a random sample;

• Each predictor variable is normally distributed;

• Each of the allocations for the dependent categories in the initial classification are correctly classified;

• There must be at least two groups or categories, with each case belonging to only one group so that the groups are mutually exclusive and collectively exhaustive (all cases can be placed in a group);

• Each group or category must be well defined, clearly differentiated from any other group(s) and natural. Putting a median split on an attitude scale is not a natural way to form groups. Partitioning quantitative variables is only justifiable if there are easily identifiable gaps at the points of division;

• For instance, three groups taking three available levels of amounts of housing loan;

• The groups or categories should be defined before collecting the data;

• The attribute(s) used to separate the groups should discriminate quite clearly between  the groups so that group or category overlap is clearly non-existent or minimal;

• Group sizes of the dependent should not be grossly different and should be at least five  times the number of independent variables.

There are several purposes of DA:

• To investigate differences between groups on the basis of the attributes of the cases, indicating which attributes contribute most to group separation. The descriptive tech- nique successively identifies the linear combination of attributes known as canonical discriminant functions (equations) which contribute maximally to group separation.

• Predictive DA addresses the question of how to assign new cases to groups. The DA function uses a person’s scores on the predictor variables to predict the category to which the individual belongs.

• To determine the most parsimonious way to distinguish between groups.

• To classify cases into groups. Statistical significance tests using chi square enable you  to see how well the function separates the groups.

• To test theory whether cases are classified as predicted.

SPSS activity – discriminant analysis

A discriminant analysis using that data which includes demographic data and scores on various questionnaires. ‘smoke’ is a nominal variable indicating whether the employee smoked or not. The other variables to be used are age, days absent sick from work last year, self-concept score, anxiety score and attitudes to anti-smoking at work score. The aim of the analysis is to determine whether these variables will discriminate between those who smoke and those who do not. This is a simple discriminant analysis with only two groups in the DV. With three or more DV groupings a multiple discriminant analysis is involved, but this follows the same process in SPSS as described below except there will be more than one set of eigenvalues, Wilks’ Lambda’s and beta coefficients. The number of sets is always one less than the number of DV groups.

1 Analyse >> Classify >> Discriminant 2 Select ‘smoke’ as your grouping variable and enter it into the Grouping Variable Box

(Fig. 25.4).

Figure 25.4 Discriminant analysis dialogue box.

• 3 Click Define Range button and enter the lowest and highest code for your groups (here it is 1 and 2) (Fig. 25.5).

• 4 ClickContinue.

• 5 Select your predictors (IV’s) and enter into Independents box (Fig. 25.6) and select  Enter Independents Together. If you planned a stepwise analysis you would at this  point select Use Stepwise Method and not the previous instruction.

• 6 Click on Statistics button and select Means, Univariate Anovas, Box’s M, Unstandardized  and Within-Groups Correlation (Fig. 25.7).

Figure 25.5 Define range box.

Figure 25.6 Discriminant analysis dialogue box.

. 7 Continue >> Classify. Select Compute From Group Sizes, Summary Table, Leave One Out Classification, Within Groups, and all Plots (Fig. 25.8).

. 8 Continue >> Save and select Predicted Group Membership and Discriminant Scores (Fig. 25.9).

9 OK.

Figure 25.7 Discriminant analysis statistics box.

Figure 25.8 Discriminant analysis classification box.

Interpreting the printout Tables 25.1 to 25.12

The initial case processing summary as usual indicates sample size and any missing data.

Group statistics tables

In discriminant analysis we are trying to predict a group membership, so firstly we examine whether there are any significant differences between groups on each of the independent variables using group means and ANOVA results data. The Group Statistics and Tests of Equality of Group Means tables provide this information. If there are no significant group differences it is not worthwhile proceeding any further with the analysis. A rough idea of variables that may be important can be obtained by inspecting the group means.

Figure 25.9 Discriminant analysis save box.

Akshith M - 14004

Group A

Business Analytics Workshop SIBM 2011 Marketing

Sunday, September 16, 2012

Day 9 :Team A

No comments:

Post a Comment