Wednesday, 10 August 2011

First Tut! It's all starting to make sense now!


Ah this semester is so busy!

I’m running very late on my blogging so have to keep this short, and as concise as possible, and catch up!

After an exciting intro to the world of Business Intelligence in Steve’s first lecture, Sindy's tutorial was amazingly very useful in tying everything together, and clarifying what exactly BI is, and how it’s connected to other technologies.

The image below illustrates what we learnt and discussed, and for people like me, who haven’t taken FIT5095 – Data Warehousing, there were a lot of new terms introduced. :)


The story is that organizations have multiple departments (e.g. Sales, Finance, Marketing, etc.) and hence, multiple data sources, i.e. their transactional systems (e.g. ERPs, CRMs, etc.). If these systems are in silos, not sharing information, there are multiple issues arising such as data redundancy, varying and inconsistent data formats, etc.

To solve this, in layman's terms, we go for integration!

Basically, a centrally managed and integrated database containing data from all transactional sources in an organization, in other words, the famous -> Data Warehouse - a dedicated database which contains detailed, stable, non-volatile and consistent data which can be analyzed in the time variant.

Now to source this data, we use the ETL process. Another new term…yay!
  • Extracting data from transactional systems
  • Transforming it to fit operational needs, including the transformation from E-R model to dimensional model (discussed in detail later on below)
  • Loading it into the end target, i.e. the data warehouse

So now we have multiple goodies that come out, such as the OLAP Analysis and Data Mining. Regarding Data Mining, which I’ll study at the end of next year, for now, all I need to know is that it’s a study of drilling down patterns, trends, and behaviors.

In this course, we’ll be focusing on the OLAP / presentation layer…hmmm so what is OLAP???

OLAP, On-Line Analytical Processing, is a class of technology used to build shared, data-oriented decision support systems, in which we can slice and dice data. In Steve’s words, they are often talked about as “pivot tables on steroids”.
In essence, the format of data is such that decisions can be made at the speed of thought! (love this statement) :D

OLTP (On-Line Transactional Processing) Systems are built to optimize transactional processing speeds, hence, the E-R model is utilized. However, OLAPs are designed to get an overview analysis of whats happening.
This is done through dimensional modelling, i.e. the star-schema, where there is a Fact Table (numeric values that can be aggregated), and Dimension Tables (groups of hierarchies and descriptors that define the facts.

So what are the cubes then???

Well, OLAP cubes (not strictly cuboids - no wonder the term is controversial - it's not applicable if there are more than three dimensions), is the name given to the process of linking data from the different dimensions. The cubes can be developed along business units such as sales or marketing. Or a giant cube can be formed with all the dimensions.


This blog seems more like an reiteration of the tutorial than a reflection, but hey, it means I was listening carefully to each word. :D

Note-to-self:
- The process of analyzing your data for making informed decisions using data from the OLTPs, and the data that results, are collectively called business intelligence.
- An OLAP database is a specialized database designed to help you extract business intelligence information from your data.

Resources: