Tuesday, September 4, 2012

Day 2 - Team B (Manas Mani)

Lecture 3:


Explanation of the usage of SPSS continued using the example sheet of Retail.






fig. 1

The figure above (fig.1) shows a snapshot of the sheet used by us in class to understand the various functionality of the software. There were methods that were taught today in lecture number 3 and 4 which were in more detail as compared to the first two. Those methods being:
  1. Continued with the use of option of Frequency and Crosstabs (fig. 1.1 and 1.2)
  2. Using the option of control variable (fig. 1.3)
  3. Select Cases (fig. 1.4)
Understanding the process used in SPSS:-
  1. Accessing the Crosstabs:




The drop down menu (fig. 1.1) shows the method of accessing the Cross Tabs options which is the most commonly (and importantly) used feature of SPSS for us. After Clicking on the Analyze option we get a drop down menu and then choosing Descriptive Statistics and then going on to choose the option of Crosstabs (fig.: 1.2).


fig. 1.2: Crosstabs option







fig. 1.1: Analyze


   2.  Crosstabs with the Control Variable:


fig. 1.3: A view of the crosstabs with the control variable (marked in the fig.)

  Cross tabulation is the process made with two or more data sources (variables) that are tabulating the results of one against the other. It is the process of creating a contingency table from the multivariate frequency distribution of statistical variables. Heavily used in survey research, cross tabulations (or crosstabs for short) can be produced by a range of statistical packages, including some that are specialized for the task. They give a basic picture about the interrelation of two variables and helps to find out interactions between them. They make it easy to zoom into on "hot spots" to see the most significant relationships between the two selected data sources.

   3.  Select Cases:


fig. 1.4


SPSS allows us to select part of the data set for further analysis, while excluding the remaining cases from these analyses. The procedure is found by choosing Select from the Data Menu.


First, we have to specify how to select data and which data to retain for the analyses:
  • All cases-This option actually turns off any previous selection and uses all data in the file. Click on this radio button and then click on the OK button.
  • If condition is satisfied-This option allows us to specify a rule based on values of variables; all cases that meet the criteria are retained. After clicking on the radio button for this option, we click on the "If..." button to bring up an additional dialogue box where we can define the rule or rules for including or excluding data.

    note: the use of the inbuilt SPSS numeric pad, as seen in fig. 1.4, is advised so that automatic spacing can be done by the program itself.


Lecture 4:

In the 4th lecture we shifted to Level 2 of our studies and there we started with the understanding of the Cluster Analysis (with the help of ppt.).


What is Cluster Analysis?


Cluster analysis or clustering is the task of assigning a set of objects into groups (called clusters) so that the objects in the same cluster are more similar (in some sense or another) to each other than to those in other clusters.
(ref. link: http://en.wikipedia.org/wiki/Cluster_analysis)

The Cluster Analysis is then further divided into:

  1. Hierarchical Clustering : In data mining, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:
    • Agglomerative: This is a "bottom up" approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
    • Divisive: This is a "top down" approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.(ref. link:  http://en.wikipedia.org/wiki/Hierarchical_clustering)
  2. K-Means Clustering :  In data miningk-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean(ref. link:  http://en.wikipedia.org/wiki/K-means_clustering)


How Does Cluster Analysis Work?

Data may be thought of as points in a space where the axes correspond to the variables.  Cluster analysis divides the space into regions characteristic of groups that it finds in the data.


Objectives of Cluster Analysis

  1. Discovering types &
  2. Reducing the number of cases by enabling  consideration of several types instead of numerous records.
(ref. link: http://www.uic.edu/classes/idsc/ids472/clustering.htm)



No comments:

Post a Comment