Ad Code

CS - 614 || Quiz no 1 Solved Spring 2020 || Cs - 614 || Data warehouse Quiz 2020 || Vicky Lab


Solution File:

What is a Data Warehouse? 

A Data Warehouse is not something shrink-wrapped i.e. you take a set of CDs and install into a box and soon you have a Data Warehouse up and running. A Data Warehouse evolves over time, you don’t buy it. Basically it is about taking/collecting data from different heterogeneous sources. Heterogeneous means not only the operating system is different but so is the underlying file format, different databases, and even with same database systems different representations for the same entity. This could be anything from different columns names to different data types for the same entity. 

Companies collect and record their own operational data, but at the same time they also use reference data obtained from external sources such as codes, prices etc. This is not the only external data, but customer lists with their contact information are also obtained from external sources. Therefore, all this external data is also added to the data warehouse.  

As mentioned earlier, even the data collected and obtained from within the company is not standard for a host of different reasons. For example, different operational systems being used in the company were developed by different vendors over a period of time, and there is no or minimal evenness in data representation etc. When that is the state of affairs (and is normal) within a company, then there is no control on the quality of data obtained from external sources. Hence all the data has to be transformed into a uniform format, standardized and integrated before it can go into the data warehouse.  In a decision support environment, the end user i.e. the decision maker is interested in the big picture. Typical DSS queries do not involve using a primary key or asking questions about a particular customer or account. DSS queries deal with number of variables spanning across number of tables (i.e. join operations) and looking at lots of historical data. As a result large number of records are processed and retrieved. For such a case, specialized or different database architectures/topologies are required, such as the star schema. We will cover this in detail in the relevant lecture.

Typical Queries  

OLTP (On Line Transaction Processing) specific query  

Select tx_date, balance from tx_table

Where account_ID = 23876; 

D DW WH H  s sp pe ec ci if fi ic c  q qu ue er ry y  

Select balance, age, sal, gender from customer_table and tx_table

Where age between (30 and 40) and

Education = ‘graduate’ and

CustID.customer_table = Customer_ID.tx_table;

Post a Comment

0 Comments

Close Menu