BLOG – DAY 1
Business Analytics (BA) refers to the skills,
technologies, applications and practices for continuous iterative exploration
and investigation of past business performance to gain insight and drive
business planning. Business analytics
focuses on developing new insights and understanding of business performance
based on data and statistical methods.
Today we started the session by learning the tool
SPSS and how it can be used in various ways.
There are two types of data –
1) Category:
A categorical variable (also known as a discrete variable) is one whose range
is countable; e.g. the variable answ has values [yes, no, not sure]. answ
is a categorical variable with range 3. Variables that could have been measured
as interval-level variables, such as age or income, have instead been made into
categoric variables by creating categories that define specific ranges for the
variables (e.g., for age, the categories are 18-24 years, 25-34 years, and so
on).
2) Continuous:
A continuous variable is one which is not categorical; e.g. weight is a
continuous variable which can take any value between 0 and 1000 kg (say) for a
human being.
There
are two views in SPSS –
1) DATA
VIEW:
Data View is the view in which we
view and edit the actual data. This data can also be populated by importing excel
files. Each column depicts a variable.
2) VARIABLE
VIEW:
Variable View can be accessed by
using the tab given on the lower left corner. This is used to define
or change the names and other properties of each variable in the data set. In
this view, each variable is represented as a row, and various properties of the
variable are represented as columns, allowing us to change the properties of
existing variables or establish properties for new variables.
The
variable properties and their functions are:
- Name - the unique variable name
- Type - the kind of data to be recorded (e.g., strings of characters, numeric values, or special numbers like dates)
- Width - the number of characters used to display the data
- Decimals - the number of decimal places displayed
- Label - a text entry to describe the data provided by the variable. With questionnaires, for example, the label is usually the text of the question. This is also used in the output files rather than the variable name.
- Missing – if the respondents do not provide any data, the field is filled with the default value specified here. It is important to fill it for better analysis. The respondent may not not have understood the questions
- Values - if specific numeric values have a non-intuitive meaning, these values can be labeled (e.g., 1 = male and 2 = female)
- Columns - determines how wide the variable column should be in Data View mode
- Align - determines whether the data should be left-justified, right-justified, or centered
- Measure - describes the level of measurement (e.g., nominal, ordinal, or scale)
Types of Measures:
1) Nominal:
A nominal scale is like a
namesake. It is simply placing data into categories, without any order or
structure.
An example of a nominal scale is
the terms we use for colours. The underlying spectrum is ordered but the names
are nominal.
In research activities a YES/NO
scale is nominal. It has no order and there is no distance between YES and NO.
The statistics which can be used
with nominal scales are in the non-parametric group. The most likely ones would
be mode and cross tabulation - with chi-square.
Location
|
Values
|
Delhi
|
50
|
Chennai
|
60
|
Bangalore
|
80
|
Hyderabad
|
10
|
In the above table values here do not hold any
meaning here.
2) Ordinal:
In this scale values are given
based on an order. When a market researcher asks to rank 5 types of beer
from most flavourful to least flavourful, he/she is asking you to create an
ordinal scale of preference. An ordinal scale only lets you interpret gross
order and not the relative positional distances. Ordinal data would use
non-parametric statistics. These would include
Median and mode, rank order
correlation, non-parametric analysis of variance.
Location
|
Values
|
Delhi
|
40
|
Chennai
|
30
|
Bangalore
|
20
|
Hyderabad
|
10
|
In the above table, the values are given based on
the size of the city.
3) Scale:
This scale tells the rank and the difference
between the variables.
Output file
This file
shows the output of all the functions that are used by a user in SPSS.
Types of Analysis -
Univariate
analysis is the simplest form of quantitative analysis. It is carried out with
the description of a single variable and its attributes of the applicable unit
of analysis. For example, if the variable age was the subject of the analysis;
the researcher would look at how many subjects fall into a given age attribute
categories. Ex: Pie charts.
The other
type is the bivariate analysis – the analysis of two variables simultaneously –
or multivariate analysis – the analysis of multiple variables simultaneously.
Univariate analysis is also used primarily for descriptive purposes, while
bivariate and multivariate analysis are geared more towards explanatory
purposes. Univariate analysis is commonly used in the first stages of research,
in analyzing the data at hand, before being supplemented by more advance,
inferential bivariate or multivariate analysis. Ex: Scatter Diagram.
Transform Menu
The
option “Recode into Different Variables..” was used in class. Example – if data
like ‘Age’ is depicted in a continuous discrete format and it needs to be
changed into category i.e. ‘Age Group’ then this option can be used. This helps
in better analysis over a given range of the variable rather than discrete
values.
Analyze Menu
a) Frequencies
Under
this menu the option “Descriptive Statistics” --> “Frequencies..” was explored. The frequencies procedure is
primarily used for discrete data (e.g., nominal and ordinal data), although
there are a number of options that are useful for scale level data.
b) Descriptives
Under this
option various statistics functions can be used on scale level variables.
c) Cross Tabs
This can
be accessed by “Descriptive Statistics”--> “Cross Tabs..” It cross-tabulates
two variables, thus displaying their relationship in tabular form. In contrast
to Frequencies, which summarizes information about one variable, Crosstab
generates information about bivariate relationships. Because Crosstab creates a row
for each value in one variable and a column for each value in the other, the
procedure is not suitable for continuous variables that assume many values.
Crosstab is designed for discrete variables--usually those measured on nominal
or ordinal scales. Crosstab are usually presented with the independent
variable across the top and the dependent along the side.
“Statistics”
button can be used to include ‘chi-square’ to consider the level of significance.
“Cells”
button can be used to include percentages in rows and columns.
By
Supriya GurtuAnkit Aggarwal
No comments:
Post a Comment