Monday, September 3, 2012

Day 1 - Team F



The lecture started with the introduction of SPSS tool. SPSS, i.e. Statistical Package and Social Sciences developed by Norman H. Nie and C. Hadlai Hull is a popular analytics tool used by market researchers, health researchers, survey companies, government, education researchers. For the classroom practice, we use the Windows evaluation version15.0 of SPSS software. The software supports descriptive statistics, bivariate statistics, prediction for numerical outcomes and prediction for identifying groups.

SPSS contains a data view (where we can feed the data) and a variable view (where we can give names to the variables). The particulars entered in the variable column in data view are called ‘cases’. The ‘type’ of ‘Name’ entered in variable view supports many types. Popular among them are ‘comma’ (e.g. Rs. 1,00,00.00) and ‘dot’ (Rs.1.00.00,00)which is widely used in European countries. The string is a group of alpha-numeric characters. We cannot change string to numeric just by clicking; we have to code it in different way.

Labels give the description about the names. Labels are used in the output description as by seeing the labels, we can know what does the variable mean.

Value variables are called as category variables and are used for first level analysis. Continuous and discrete variables are used for second level analysis. Most of the methods require the use of continuous variables.

If any respondent does not give the data, the after-analysis is useful as to the reason why he/she has not provided the data. Knowing this, we know whether we have to rephrase our question or the question should not have been asked in the first place.

Three types of measures are used: Nominal, ordinal and scale. Nominal numbers are just for name-sake and these numbers do not have any intrinsic information in them. Ordinal and scale numbers contain information in them. Ordinal numbers are ordered according to certain pattern and the scale numbers (which are further classified into interval and ratio) can also tell you by how much they differ.

Analysis of the data is classified into univariate, bivariate and multi-variate analysis. Corresponding to this, there are different graphs. For example, bubble graphs and radial graphs are used for multi-variate analysis.

Frequency and cumulative frequency analysis is used to categorize the values of the variables in order to analyze them. For example, if 50% of the respondents as seen from cumulative frequency of the ‘age when people get first married’ fall under 21 years, for analysis purpose, we can set the ‘early first marriage’ age as less than or equal to 21 and ‘late first marriage’ age as age greater than 21.

To find whether there is a relation between two variables or not, we first set the null hypothesis considering there is no relation between these variables. And we examine the validity of the null hypothesis by keeping the variable compared in the ‘row’ section of the crosstab. Then we click on the row and column percentage to see if relation exists.

For this chi-square test, we always test for some confidence level. Confidence level depends on the criticality of the scenario concerned. For the routine business decisions, we take the confidence level to be 95 %( For more critical scenarios, 99% confidence level is a must). i.e., if significance value is less than 0.05, then there is a significant difference between the two variables under examination;  and in this case, we reject the Null Hypothesis and accept the alternate hypothesis. 

By
Unmesh Ramesh Kulkarni
Parveen Rathee

No comments:

Post a Comment