Monday, September 3, 2012

Day 1 - Team A




The ultimate authority must always rest with the individual's own reason and critical analysis !

First class of Business Analytics (BA), expectations were high, learning curve on its ignition point. We started with a world famous software package known as SPSS and its role in Business Analytics domain.

So what is SPSS??? (Courtesy: GoogleJ)
SPSS (originally, Statistical Package for the Social Sciences) is a computer program used for survey authoring and deployment (IBM SPSS Data Collection), data mining (IBM SPSS Modeller), text analytics, statistical analysis, and collaboration and deployment (batch and automated scoring services). In simple words when you are not sure how to use the huge chunks of data collected from surveys and questionnaires it can provide us the guidance to do so or at least it claims to do so.

So what we did with SPSS today??? (Courtesy: Marketing Class 2011-13 batch)
We started our sacred journey by few basic but important steps.

  • 1.      Downloaded the evaluation version of SPSS version 15
  • 2      Installed it
  • 3.      And then opened it. Quite Simple till now.
Initially we started with understanding the following:
·         Variable types – It includes Dots, Commas, Strings, etc. in the figures. We learnt that Europeans generally use Dots instead of commas, and rest all the countries use Commas.  Strings are generally used for alpha-numeric characters.

·         Width – This helps us to show the number of characters that can be put in the cell.

·         Label – It helps us to provide a detailed description of the variable.

·         Value – This is generally used for the items which can be categorised. Values can be divided into:

o     Category (used mainly for 1st level analysis)
o    Continuous (used for 2nd level analysis). The continuous variable can be further categorised into Continuous and Discrete.

·         Missing Numbers – Sometimes the respondents doesn’t answer/fill a particular category because of various reasons, this leads to missing numbers. This can happen for reasons like privacy, respondent didn’t understand the Qs, or Qs not applicable for him. These missing numbers is necessary because it helps us in correcting our mistake if any in framing the Qs which might have lead to no response from the respondent.



·         Measure – This is of 3 types, namely,

o   Nominal – A nominal scale is like a namesake. It is simply placing data into categories, without any order or structure. This does not contain any information about the person.
Location
Values
Delhi
50
Chennai
60
Bangalore
80
Hyderabad
10
The above table does not give us any proper information about the variables.

o   Ordinal – Here, the numbers are based on an order, but this does not tell by how much it varies from the other variable. An ordinal scale only lets you interpret gross order and not the relative positional distances. These would include Median and mode, rank order correlation, non-parametric analysis of variance.

o   Scale – This helps us in identifying the order and also tells us by how much it varies from the other variable. This scale tells us the difference between variables.



Descriptive Statistics in SPSS
·         Frequencies: The frequencies procedure is primarily used for discrete data (e.g., nominal and ordinal data), although there are a number of options that are useful for scale level data.

·         Cross Tabulation: It is a process by which two or more data variables are tabulated, and displays the relationship in a tabular form. Generates information about bi-variate relationships. It is not suitable for continuous variables that assume multiple values. In such a case, these continuous variables are re-categorised into category variables.
For example: Age can be 12,13,14,15,16,17,18,19,20,21,22,23,24,25, and 26 years. The task is to find out how many marry before the age of 20 years. Depicting this data in a cross tabulated form can be very tedious. Thus it can be re-categorised in to age groups 12-19 and 20-26. 
Suppose we are given a data sheet for e.g.: Datasheet of US Census survey 1993 and we are asked from this huge collection of data to find out a relation between “Is life dull or exciting and Age when first married”. Is this tuff!! Not so much, we have to simply follow some steps:
  1.  First we have to create a Hypothesis for e.g. in this case we can consider it:  People who have exciting life have an early marriage! (Personally I don’t have any clue, but let’s see how SPSS handle this hypothesis. The null hypothesis would be no such relation exists!
  2. Since we are intelligent being and we somehow believe grouping in category simple features, we do the same thing here. Since the age range for which first marriage takes place is 13-58 years, it’s quite tough or tedious to view in Output sheet. We try to categorize the age group accordingly 13-21, early marriage and 22-58 late marriage. This is done through by using SPSS tool( Go to Transform-> Re code into different variables->Select the input variable in this case age when first married-> rename and categorize its values according to new nomenclature)
  3. Once a new variable is created, we move to Analyse section and click on to Descriptive->Crosstabs. Here we select the beginning clause of hypothesis, in this case it would be variable life is exciting or not , as the row and the other variable, age when first married, as column( Other features that can be customized are, we can get the results in percentages, also can use extra statistical tools like chi square to see the confidence interval for the hypothesis)
  4. After completion of the above step the output sheet projects the result and we can verify whether our hypothesis is rejected or not. Suppose we take our confidence interval 95% then accordingly with chi square value less than 0.05 the null hypothesis gets rejected. Let us see from the results in the output sheet


After that we move on to amateur stage:

  Once we open SPSS  programme we can see two window 
       i)Data View ii) Variable View

      Now if we zoom in Dataset window we see two sheets a.) Data View – which holds all the  data in coded format b.) Variable View- which holds the attributes of the data that is present in data view.

Now before adjusting the values in Variable View we should gather some basic knowledge about types of measurement, types of data variable and some basic statistic terminologies.
These were some of the basics we ought to understand before handling analysis in SPSS.

Let’s now try to relate integral aspects of SPSS and analytics (The Pro Mode).


The o/p table shows that chi square value is greater than 0.05, and so we have to reject the hypothesis and confirm the null hypothesis (no such relation exist)

The above example was just an illustration of how we can use SPSS as a tool to check the validity of a hypothesis by just using crosstab and frequency.  But some points need to be taken into consideration before using SPSS:
a.       Variables and their nomenclatures
b.      Type of Measurement scale used: Nominal, ordinal or scale depending upon the data type and the context.
c.       Variables that are to be used for finding relation or for various analysis
d.      Knowledge of basic statistical tools like chi square, confidence interval in order to make the analysis more valid.
e.       Lastly not to depend only upon SPSS for results but also use personal reasoning and judgement in order to reach to a proper conclusion.


      These are some of the things that we experienced on this first session of Business Analytics. Hope our knowledge and acumen gets more enriched with such sessions.


Posted by:
Abhik Chakraborty(Roll 14002)
Ankit Jaiswal (Roll 14007) 



No comments:

Post a Comment