Monday, September 3, 2012

Day 1 - Team J


BLOG – DAY 1

Business Analytics (BA) refers to the skills, technologies, applications and practices for continuous iterative exploration and investigation of past business performance to gain insight and drive business planning.  Business analytics focuses on developing new insights and understanding of business performance based on data and statistical methods.
Today we started the session by learning the tool SPSS and how it can be used in various ways.
There are two types of data –
1)      Category: A categorical variable (also known as a discrete variable) is one whose range is countable; e.g. the variable answ has values [yes, no, not sure]. answ is a categorical variable with range 3. Variables that could have been measured as interval-level variables, such as age or income, have instead been made into categoric variables by creating categories that define specific ranges for the variables (e.g., for age, the categories are 18-24 years, 25-34 years, and so on). 
2)      Continuous: A continuous variable is one which is not categorical; e.g. weight is a continuous variable which can take any value between 0 and 1000 kg (say) for a human being.

There are two views in SPSS – 

1)      DATA VIEW:
Data View is the view in which we view and edit the actual data. This data can also be populated by importing excel files. Each column depicts a variable.

2)      VARIABLE VIEW:
Variable View can be accessed by using the tab given on the lower left corner. This is used to define or change the names and other properties of each variable in the data set. In this view, each variable is represented as a row, and various properties of the variable are represented as columns, allowing us to change the properties of existing variables or establish properties for new variables.
The variable properties and their functions are:
  • Name - the unique variable name
  • Type - the kind of data to be recorded (e.g., strings of characters, numeric values, or special numbers like dates)
  • Width - the number of characters used to display the data
  • Decimals - the number of decimal places displayed
  • Label - a text entry to describe the data provided by the variable. With questionnaires, for example, the label is usually the text of the question. This is also used in the output files rather than the variable name.
  • Missing – if the respondents do not provide any data, the field is filled with the default value specified here. It is important to fill it for better analysis. The respondent may not not have understood the questions
  • Values - if specific numeric values have a non-intuitive meaning, these values can be labeled (e.g., 1 = male and 2 = female)
  • Columns - determines how wide the variable column should be in Data View mode
  • Align - determines whether the data should be left-justified, right-justified, or centered
  • Measure - describes the level of measurement (e.g., nominal, ordinal, or scale)
Types of Measures:
1)      Nominal:
A nominal scale is like a namesake. It is simply placing data into categories, without any order or structure.

An example of a nominal scale is the terms we use for colours. The underlying spectrum is ordered but the names are nominal.

In research activities a YES/NO scale is nominal. It has no order and there is no distance between YES and NO.
The statistics which can be used with nominal scales are in the non-parametric group. The most likely ones would be mode and cross tabulation - with chi-square.       
Location
Values
Delhi
50
Chennai
60
Bangalore
80
Hyderabad
10
In the above table values here do not hold any meaning here.
2)      Ordinal:
In this scale values are given based on an order. When a market researcher asks to rank 5 types of beer from most flavourful to least flavourful, he/she is asking you to create an ordinal scale of preference. An ordinal scale only lets you interpret gross order and not the relative positional distances. Ordinal data would use non-parametric statistics. These would include
Median and mode, rank order correlation, non-parametric analysis of variance.
Location
Values
Delhi
40
Chennai
30
Bangalore
20
Hyderabad
10
In the above table, the values are given based on the size of the city.
3)      Scale:
This scale tells the rank and the difference between the variables.  

Output file
This file shows the output of all the functions that are used by a user in SPSS.

Types of Analysis -
Univariate analysis is the simplest form of quantitative analysis. It is carried out with the description of a single variable and its attributes of the applicable unit of analysis. For example, if the variable age was the subject of the analysis; the researcher would look at how many subjects fall into a given age attribute categories. Ex: Pie charts.

The other type is the bivariate analysis – the analysis of two variables simultaneously – or multivariate analysis – the analysis of multiple variables simultaneously. Univariate analysis is also used primarily for descriptive purposes, while bivariate and multivariate analysis are geared more towards explanatory purposes. Univariate analysis is commonly used in the first stages of research, in analyzing the data at hand, before being supplemented by more advance, inferential bivariate or multivariate analysis. Ex: Scatter Diagram.

Transform Menu
The option “Recode into Different Variables..” was used in class. Example – if data like ‘Age’ is depicted in a continuous discrete format and it needs to be changed into category i.e. ‘Age Group’ then this option can be used. This helps in better analysis over a given range of the variable rather than discrete values. 

Analyze Menu
a)      Frequencies
Under this menu the option “Descriptive Statistics” --> “Frequencies..”  was explored. The frequencies procedure is primarily used for discrete data (e.g., nominal and ordinal data), although there are a number of options that are useful for scale level data.
b)      Descriptives
Under this option various statistics functions can be used on scale level variables.
c)      Cross Tabs
This can be accessed by “Descriptive Statistics”--> “Cross Tabs..” It cross-tabulates two variables, thus displaying their relationship in tabular form. In contrast to Frequencies, which summarizes information about one variable, Crosstab generates information about bivariate relationships. Because Crosstab creates a row for each value in one variable and a column for each value in the other, the procedure is not suitable for continuous variables that assume many values. Crosstab is designed for discrete variables--usually those measured on nominal or ordinal scales. Crosstab are usually presented with the independent variable across the top and the dependent along the side.
“Statistics” button can be used to include ‘chi-square’ to consider the level of significance.
“Cells” button can be used to include percentages in rows and columns.

By
Supriya Gurtu
Ankit Aggarwal


No comments:

Post a Comment