Data Warehouse and Its Characteristics

Subject: Tech & Engineering
Pages: 1
Words: 331
Reading time:
2 min
Study level: College

Summary

A data warehouse is a central storage system that stores data from various sources. Data from sources like transactional systems, relational databases, and others flow into a data warehouse. In other words, a data warehouse refers to a collection of organizational data obtained from operational and external data sources. The main characteristics of a data warehouse are subject-based, integrated, non-volatile, and time-variant. It provides information based on themes or subjects rather than organizational operations. A data warehouse is developed by integrating data from different sources. In addition, it has time-variant keys such as date, month, and time. Therefore, a data warehouse is a large store of data retrieved from different sources.

The Difference Between Data Warehouse and Data Mart

Data warehouse Data mart
Definition It is a large store of data obtained from a wide range of sources within an organization. It is a subset of a data warehouse that majors on a specific operation in a business. For example, it can major in sales or finance.
Users All departments in an organization A single department
Size It is large because it accumulates data from all operations in a business. It is relatively small because it collects information from a single department or unit.
Design It uses a top-down design approach It uses a bottom-up design approach

Problems When Operational Data are Integrated into The Data Warehouse

There are different challenges experienced when operational data are integrated. One of the problems of integration in a data warehouse is data homogenization. The existence of similar data formats from various sources may result in the loss of valuable parts of the data. The other challenge is the difficulty of controlling access to data. An organization may be unable to differentiate and decide the department that must have to the warehouse. Another problem caused by the integration is the high cost of maintenance. There is a need to make sure that there is no reorganization of data as it comes from different sources.