Data Warehouse-Friendly Database Design

OLAP vs. OLTP

Two types of database systems exist: OLTP and OLAP.

  1. Online Transactional Processing (OLTP): OLTP collects data and then synchronizes the data to OLAP. OLTP focuses on short but rapid and high-frequency querying.
  2. Online Analytical Processing (OLAP): OLAP reduces the load of the OLTP and improves the utilization rate of data along with analyzing the data. OLAP focuses on relatively low but complex operations, such as data mining.

Data Synchronization Method

There are two methods to synchronize data from OLTP to OLAP. Below is a brief description:

  1. Full Synchronization: This method synchronizes data from the entire table to the corresponding table of OLAP each time.
  2. Incremental Synchronization: This procedure synchronizes modified data to the corresponding table of OLAP each time.

Typical Table Fields

The following fields are contained when creating a table:

  1. primary key: Each table has a primary key
  2. created_time: This field indicates when the data was created and it remains unchanged.
  3. modified_time: This field indicates when the data was modified. It updates to the latest time whenever it is modified.
  4. deleted_flag: This field indicates that physical deletion is not enabled. If you want to delete data, you need to set this field as true. If this field is set as true, it remains the same, and the modified_time will not change.

Synchronization Process

Data warehouse includes the ODS layer and the DW layer. ODS stores OLTP’s raw data. The synchronization process mainly occurs on the ODS layer. If the modified_time is the record of the day, it indicates that the relevant data is the incremental data of the very day. Therefore, when synchronizing, you can make a judgment based on the modified_time.

  1. Possible Exceptions: For instance, the created_time of entry A is 20170304, but modified at 20170305 00:20, then the modified_time was changed to 20170305.
  2. The synchronization task would have been executed at 20170305 00:00. However, due to the schedule delay, it was not executed until 20170305 00:30.
  3. At this time, entry A will be missed, because the modified_time of entry A has been changed to 20170305 at the time of synchronization.
  4. Possible Solution: One of the solutions is to change the synchronization condition. If the modified_time of the record is of the same day, or the created_time of the record is of the same day, it indicates that relevant data is the incremental data of the very day.

Deduplication Process

  1. When summarizing the ODS layer tables on the DW layer, execution of deduplication becomes necessary. This is because the same entry may have been modified several times and during different time periods.
  2. Deduplication execution can occur by using the analysis function row_number() over (partition by primary_key order by modified_time desc). That is, get the latest modified_time record using the primary key. Once completed, you can delete the record whose deleted_flag is true during this process.

Data Verification Process

When execution of deduplication is implemented for the data of the ODS layer on the DW layer, you can get the same data as the production data. Verify whether it is true with the following method:

total = select count(*) from table_a
created = select count(*) from table_a where created_time = T+1
deleted = select count(*) from table_a where created_time <= T and modified_time < = T and deleted_flag = true
result = total - created - deleted

Conclusion

In this blog, we discussed about the types of database systems that exist in data warehousing and how we can achieve data synchronization by adopting different methods. Various processes could be followed to facilitate the creation of a database design that is supportive and coherent to the database system. The consolidation of a database system and a database design yield advantages such as enhanced data availability, reduced response time, along with efficient volume replication.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Alibaba Cloud

Alibaba Cloud

Follow me to keep abreast with the latest technology news, industry insights, and developer trends. Alibaba Cloud website:https://www.alibabacloud.com