What is an OLAP Cube?
An OLAP cube is an array of data that is understood in terms of its 0 or more dimensions. OLAP is an acronym for online analytical processing. OLAP is a computer-based technique for analyzing business data in the search for business intelligence. In other words it is primarily involved with reading and aggregating large groups of diverse data involved in complex relationships. OLAP analyses these relationships and looks out for patterns, trends and exception conditions.
Why are OLAP cubes important?
Before
OLAP technology was well developed, data had to be extracted from databases
using "queries". This meant that the analyst had to structure a
request to the database for the information desired, and then submitted this
query to the database server. That
server would processing query and return the results. Depending on the size of the database and the
data requested, this query could take minutes or hours to complete.
OLAP
cubes are fundamentally different in that they "pre-aggregate" the
data used to answer many of queries that are anticipated. This pre-aggregation occurs when the cube is
built, which means that this process is already completed when the user queries
the data.
In
addition, the size of an OLAP cube depends on the number of measures and
dimensions and contains -- it may have no relationship to the side of the
initial data set. Therefore, a claims
data set having millions of members can be consolidated into a relatively small
OLAP cube that can return data almost instantaneously.
Types of Variables in OLAP Cubes
1.
Summary Variables
2.
Grouping Variables
Steps in Producing a Three-Dimensional Table
1. Choose
File --> Open --> Data and open file
2. Choose
Analyze --> Reports --> OLAP Cubes
3. In the
list on the left of the OLAP cubes dialog box:
a. Select relevant variables and move it to summary
variables panel.
b. Select relevant variables and move it to grouping
variables panel.
4. Click
on the statistics button & Click the OK button.
OLAP Cubes are source of slicing and dicing in data
mining. Where Slicing means taking out the slice of a cube, given certain set
of select dimension (customer segment), and value and measures (sales revenue,
sales units) or KPIs (Sales Productivity). Dicing means viewing the slices from
different angles. For example - Revenue for different products
within a given state OR revenue for different states for a given product.
Jaccard distance-
Jaccard
distance depends on the number of similar values the two variables to be
analyzed have, for e.g.- If X & Y are 2 variables and the possible binary
values they can have be YES & NO then the Jaccard distance between X & Y
can be calculated using the formula:
(Number
of YES matches/the total number of response) - NO matches
So,
the more similar the variation in the variables is there the lesser is the
Jaccard distance.
Eucilidean distance-
There
are number of ways to calculate Euclidean distance, one of them is average
method. In average method we calculate the distance between various points in
the two clusters, then all the distances is averaged, and that distance is the
Euclidean distance.
No comments:
Post a Comment