Discriminant Function Analysis (DA) undertakes the
same task as multiple linear regression by predicting an outcome. However,
multiple linear regression is limited to cases where the dependent variable on
the Y axis is an interval variable so that the combination of predictors will,
through the regression equation, produce estimated mean population numerical Y
values for given values of weighted combinations of X values. But many
interesting variables are categorical, such as political party voting
intention, migrant/non-migrant status, making a profit or not, holding a
particular credit card, owning, renting or paying a mortgage for a house,
employed/unemployed, satisfied versus dissatisfied employees, which custom- ers
are likely to buy a product or not buy, what distinguishes Stellar Bean clients
from Gloria Beans clients, whether a person is a credit risk or not, etc.
DA is used when:
• The dependent is categorical with the
predictor IV’s at interval level such as age, income, attitudes, perceptions,
and years of education, although dummy variables can be used as predictors as
in multiple regression. Logistic regression IV’s can be of any level of
measurement.
• There are more than two DV categories,
unlike logistic regression, which is limited to a dichotomous dependent
variable.
Assumptions of discriminant analysis
The major underlying assumptions of DA are:
•
The observations are a random sample;
•
Each predictor variable is normally distributed;
• Each of the allocations for the
dependent categories in the initial classification are correctly classified;
• There must be at least two groups or
categories, with each case belonging to only one group so that the groups are
mutually exclusive and collectively exhaustive (all cases can be placed in a
group);
• Each group or category must be well
defined, clearly differentiated from any other group(s) and natural. Putting a
median split on an attitude scale is not a natural way to form groups.
Partitioning quantitative variables is only justifiable if there are easily
identifiable gaps at the points of division;
• For instance, three groups taking three
available levels of amounts of housing loan;
• The groups or categories should be
defined before collecting the data;
• The attribute(s) used to separate the
groups should discriminate quite clearly between
the groups so that group or
category overlap is clearly non-existent or minimal;
• Group sizes of the dependent should not
be grossly different and should be at least five
times the number of
independent variables.
There are several purposes of DA:
• To investigate differences between groups
on the basis of the attributes of the cases, indicating which attributes
contribute most to group separation. The descriptive tech- nique successively
identifies the linear combination of attributes known as canonical discriminant
functions (equations) which contribute maximally to group separation.
• Predictive DA addresses the question of
how to assign new cases to groups. The DA function uses a person’s scores on
the predictor variables to predict the category to which the individual
belongs.
• To determine the most parsimonious way to
distinguish between groups.
• To classify cases into groups.
Statistical significance tests using chi square enable you
to see how well the
function separates the groups.
• To test theory whether cases are
classified as predicted.
SPSS activity – discriminant analysis
A discriminant analysis using that data which
includes demographic data and scores on various questionnaires. ‘smoke’ is a
nominal variable indicating whether the employee smoked or not. The other
variables to be used are age, days absent sick from work last year,
self-concept score, anxiety score and attitudes to anti-smoking at work score.
The aim of the analysis is to determine whether these variables will
discriminate between those who smoke and those who do not. This is a simple
discriminant analysis with only two groups in the DV. With three or more DV
groupings a multiple discriminant analysis is involved, but this follows the
same process in SPSS as described below except there will be more than one set
of eigenvalues, Wilks’ Lambda’s and beta coefficients. The number of sets is
always one less than the number of DV groups.
1 Analyse >>
Classify >> Discriminant
2 Select
‘smoke’ as your grouping variable and enter it into the Grouping Variable
Box
(Fig. 25.4).
Figure 25.4 Discriminant analysis dialogue box.
•
3 Click Define Range button and
enter the lowest and highest code for your groups (here it is 1 and 2) (Fig.
25.5).
•
4 ClickContinue.
•
5 Select your predictors (IV’s) and enter
into Independents box (Fig. 25.6) and select
Enter
Independents Together. If you planned a stepwise analysis you would
at this
point
select Use Stepwise Method and not the previous instruction.
•
6 Click on Statistics button
and select Means, Univariate Anovas, Box’s M, Unstandardized
and Within-Groups
Correlation (Fig. 25.7).
Figure 25.5 Define range box.
Figure 25.6 Discriminant analysis dialogue box.
. 8 Continue >> Save and select Predicted Group Membership and Discriminant Scores (Fig. 25.9).
9 OK.
Figure 25.7 Discriminant analysis statistics box.
Figure 25.8 Discriminant analysis classification box.
Interpreting the printout Tables 25.1 to 25.12
The initial case processing summary as usual
indicates sample size and any missing data.
Group statistics tables
In discriminant analysis we are trying to predict a
group membership, so firstly we examine whether there are any significant
differences between groups on each of the independent variables using group
means and ANOVA results data. The Group Statistics and Tests of Equality of
Group Means tables provide this information. If there are no significant group
differences it is not worthwhile proceeding any further with the analysis. A
rough idea of variables that may be important can be obtained by inspecting the
group means.
Figure 25.9 Discriminant analysis save box.
Akshith M - 14004
Group A
No comments:
Post a Comment