Optimizing Process of Data Extraction, Transformation and Load in Data Warehouse Based on Parallel Processing

Author

Abstract

Abstract

Data Warehouses are used to store data in a structure that facilitates data analysis. The process of Extracting, Transforming, and Loading (ETL) covers the process of retrieving required data from the source system and loading them to the data warehouse. Although the structure of source data (e.g. ER model) and DW (e.g. star schema) are usually specified, there is a clear lack of a standard model to represent the ETL scenarios. Using various tools, the ELT process is designed in many different ways based on the source and destination of data structure. The ETL process has time and cost bottlenecks in process of building DW. According to previously proposed different methods on reducing time and efficiency of the ETL process, this paper tries to propose a more efficient method. This paper represents a reduction of the execution time of the ETL process using the parallel processing techniques leading to reduction rate of 29% in execution time.

Keywords