The ultimate authority must always rest with the individual's own
reason and critical analysis !
First class of Business Analytics (BA), expectations were high,
learning curve on its ignition point. We started with a world famous software
package known as SPSS and its role in Business Analytics domain.
So what is SPSS??? (Courtesy: GoogleJ)
SPSS
(originally, Statistical Package for the
Social Sciences) is a computer program used for survey authoring and
deployment (IBM SPSS Data Collection), data mining (IBM SPSS Modeller), text
analytics, statistical analysis, and collaboration and deployment (batch and
automated scoring services). In simple words when you are not sure how to use the
huge chunks of data collected from surveys and questionnaires it can provide us
the guidance to do so or at least it claims to do so.
So what we did with SPSS
today???
(Courtesy: Marketing Class 2011-13 batch)
We
started our sacred journey by few basic but important steps.
- 1. Downloaded the evaluation version of SPSS version 15
- 2 Installed it
- 3. And then opened it. Quite Simple till now.
Initially we started with
understanding the following:
·
Variable types – It
includes Dots, Commas, Strings, etc. in the figures. We learnt that Europeans
generally use Dots instead of commas, and rest all the countries use
Commas. Strings are generally used for
alpha-numeric characters.
·
Width – This helps us to
show the number of characters that can be put in the cell.
·
Label – It helps us to
provide a detailed description of the variable.
·
Value – This is
generally used for the items which can be categorised. Values can be divided
into:
o Category (used mainly
for 1st level analysis)
o Continuous (used for 2nd level analysis). The
continuous variable can be further categorised into Continuous and Discrete.
·
Missing Numbers –
Sometimes the respondents doesn’t answer/fill a particular category because of
various reasons, this leads to missing numbers. This can happen for reasons
like privacy, respondent didn’t understand the Qs, or Qs not applicable for
him. These missing numbers is necessary because it helps us in correcting our
mistake if any in framing the Qs which might have lead to no response from the
respondent.
·
Measure – This is of 3 types, namely,
o Nominal
– A nominal
scale is like a namesake. It is simply placing data into categories, without
any order or structure. This does not contain any information
about the person.
Location
|
Values
|
Delhi
|
50
|
Chennai
|
60
|
Bangalore
|
80
|
Hyderabad
|
10
|
The above table does not give us
any proper information about the variables.
o Ordinal
– Here, the numbers are based on an order, but this does not tell by how much
it varies from the other variable. An ordinal scale only lets you interpret
gross order and not the relative positional distances. These would include Median and
mode, rank order correlation, non-parametric analysis of variance.
o Scale
– This helps us in identifying the order and also tells us by how much it
varies from the other variable. This scale tells us the difference between variables.
Descriptive Statistics in SPSS
·
Frequencies: The
frequencies procedure is primarily used for discrete data (e.g., nominal and
ordinal data), although there are a number of options that are useful for scale
level data.
·
Cross Tabulation: It is
a process by which two or more data variables are tabulated, and displays the
relationship in a tabular form. Generates information about bi-variate
relationships. It is not suitable for continuous variables that assume multiple
values. In such a case, these continuous variables are re-categorised into
category variables.
For example: Age can be 12,13,14,15,16,17,18,19,20,21,22,23,24,25,
and 26 years. The task is to find out how many marry before the age of 20
years. Depicting this data in a cross tabulated form can be very tedious. Thus
it can be re-categorised in to age groups 12-19 and 20-26.
Suppose we are given a data sheet for e.g.: Datasheet
of US Census survey 1993 and we are asked from this huge collection of data to
find out a relation between “Is life dull
or exciting and Age when first married”. Is this tuff!! Not so much, we
have to simply follow some steps:- First we have to create a Hypothesis for e.g. in this case we can consider it: People who have exciting life have an early marriage! (Personally I don’t have any clue, but let’s see how SPSS handle this hypothesis. The null hypothesis would be no such relation exists!
- Since we are intelligent being and we somehow believe grouping in category simple features, we do the same thing here. Since the age range for which first marriage takes place is 13-58 years, it’s quite tough or tedious to view in Output sheet. We try to categorize the age group accordingly 13-21, early marriage and 22-58 late marriage. This is done through by using SPSS tool( Go to Transform-> Re code into different variables->Select the input variable in this case age when first married-> rename and categorize its values according to new nomenclature)
- Once a new variable is created, we move to Analyse section and click on to Descriptive->Crosstabs. Here we select the beginning clause of hypothesis, in this case it would be variable life is exciting or not , as the row and the other variable, age when first married, as column( Other features that can be customized are, we can get the results in percentages, also can use extra statistical tools like chi square to see the confidence interval for the hypothesis)
- After completion of the above step the output sheet projects the result and we can verify whether our hypothesis is rejected or not. Suppose we take our confidence interval 95% then accordingly with chi square value less than 0.05 the null hypothesis gets rejected. Let us see from the results in the output sheet
After
that we move on to amateur stage:
Once we open SPSS programme we can see two window
i)Data View ii) Variable View
Now if we zoom in Dataset window we see two
sheets a.) Data View – which holds
all the data in coded format b.) Variable
View- which holds the attributes of the data that is present in data view.
Now before adjusting the values in Variable View
we should gather some basic knowledge about types of measurement, types of data
variable and some basic statistic terminologies.
These were some of the basics we ought to
understand before handling analysis in SPSS.
Let’s now
try to relate integral aspects of SPSS and analytics (The Pro Mode).
The o/p table shows
that chi square value is greater than 0.05, and so we have to reject the
hypothesis and confirm the null hypothesis (no such relation exist)
The above example was
just an illustration of how we can use SPSS as a tool to check the validity of a
hypothesis by just using crosstab and frequency. But some points need to be taken into
consideration before using SPSS:
a. Variables and their
nomenclatures
b. Type of Measurement
scale used: Nominal, ordinal or scale depending upon the data type and the
context.
c. Variables that are to be
used for finding relation or for various analysis
d. Knowledge of basic
statistical tools like chi square, confidence interval in order to make the
analysis more valid.
e. Lastly not to depend
only upon SPSS for results but also use personal reasoning and judgement in
order to reach to a proper conclusion.
These
are some of the things that we experienced on this first session of Business Analytics. Hope our knowledge and acumen
gets more enriched with such sessions.
Posted by:
Abhik Chakraborty(Roll 14002)
Ankit Jaiswal (Roll 14007)
No comments:
Post a Comment