Wednesday, September 5, 2012

Day 3 - TEAM D - Aman Malhotra


What is an OLAP Cube? 

An OLAP cube is an array of data that is understood in terms of its 0 or more dimensions. OLAP is an acronym for online analytical processing. OLAP is a computer-based technique for analyzing business data in the search for business intelligence. In other words it is primarily involved with reading and aggregating large groups of diverse data involved in complex relationships. OLAP analyses these relationships and looks out for patterns, trends and exception conditions.


Why are OLAP cubes important?
Before OLAP technology was well developed, data had to be extracted from databases using "queries". This meant that the analyst had to structure a request to the database for the information desired, and then submitted this query to the database server.  That server would processing query and return the results.  Depending on the size of the database and the data requested, this query could take minutes or hours to complete. 
OLAP cubes are fundamentally different in that they "pre-aggregate" the data used to answer many of queries that are anticipated.  This pre-aggregation occurs when the cube is built, which means that this process is already completed when the user queries the data. 
In addition, the size of an OLAP cube depends on the number of measures and dimensions and contains -- it may have no relationship to the side of the initial data set.  Therefore, a claims data set having millions of members can be consolidated into a relatively small OLAP cube that can return data almost instantaneously.

Types of Variables in OLAP Cubes
1. Summary Variables
2. Grouping Variables

Steps in Producing a Three-Dimensional Table
1. Choose File --> Open --> Data and open file
2. Choose Analyze --> Reports --> OLAP Cubes 
3. In the list on the left of the OLAP cubes dialog box:
        a. Select relevant variables and move it to summary variables panel.
        b. Select relevant variables and move it to grouping variables panel.
4. Click on the statistics button & Click the OK button. 

OLAP Cubes are source of slicing and dicing in data mining. Where Slicing means taking out the slice of a cube, given certain set of select dimension (customer segment), and value and measures (sales revenue, sales units) or KPIs (Sales Productivity). Dicing means viewing the slices from different angles. For example - Revenue for different products within a given state OR revenue for different states for a given product.

Jaccard distance-
Jaccard distance depends on the number of similar values the two variables to be analyzed have, for e.g.- If X & Y are 2 variables and the possible binary values they can have be YES & NO then the Jaccard distance between X & Y can be calculated using the formula:
(Number of YES matches/the total number of response) -  NO matches  
So, the more similar the variation in the variables is there the lesser is the Jaccard distance.
Eucilidean distance-
There are number of ways to calculate Euclidean distance, one of them is average method. In average method we calculate the distance between various points in the two clusters, then all the distances is averaged, and that distance is the Euclidean distance.

No comments:

Post a Comment