بهبود زمان پاسخ‌دهی پرس‌وجوهای تحلیلی در انباره داده‌ای برخط با استفاده از الحاق دیدهای ذخیره‌شده

نویسندگان

چکیده

یک انباره داده‌ای برخط مجموعه‌ای از داده‌های اخیر و داده‌های سلسله‌مراتبی است که برای اخذ تصمیمات، توسط مدیران از طریق ایجاد پرس‌و‌جوهای تحلیلی برخط مورد استفاده قرار می‌گیرد. داده‌هایی که از منابع داده‌ای واکشی می‌شوند و به انباره داده‌ای برخط وارد می‌شوند، روزبه‌روز در حال افزایش است و همچنین با افزایش حجم داده‌های ورودی به انباره داده‌ای برخط تداخل بین عملیات بارگیری و پردازش تحلیلی برخط بیش ‌از پیش افزایش پیدا می‌کند. این دو چالش به مهم‌ترین مسائل در زمینه انبار داده‌ای برخط تبدیل شده‌اند. در این مقاله، روشی برای بهبود زمان پاسخ‌دهی پرس‌و‌جوهای تحلیلی در معماری انباره داده‌ای برخط با استفاده از الحاق دیدهای ذخیره‌شده ارائه شده است. فرایند کار بدین صورت است که نتایج پرس‌وجوهای اجراشده در هر بخش برخط ذخیره می‌گردند و در هنگام انتقال داده به بخش بعدی، این نتایج از قبل محاسبه‌شده نیز منتقل می‌شوند.  در هنگام انتقال داده، هر بخش برخط حاوی مجموع داده بخش قبلی خود در چندین مرحله از انتقال است. در نتیجه با انتقال داده، نتایج محاسبه‌شده پرس‌و‌جوها نیز منتقل میشوند و می‌توان بدون نیاز به اجرای دوباره پرس‌و‌جوها، نتایج محاسبه‌شده قبلی را با هم الحاق کرد و به نتیجه دلخواه رسید. روش پیشنهادی منجر به کاهش زمان پاسخ‌دهی به پرس‌و‌جوهای تحلیلی و کاهش تداخل ورود داده با اجرای هم‌زمان و طولانی‌مدت پرس‌و‌جوها شده است. چالشی که این پژوهش با آن مواجه است، این است که روش پشنهادی بر روی حجم کمی از داده در بخش برخط، مورد استفاده قرار گیرد و همچنین چالش بعدی، شامل تغییرات مورد نیاز برای استفاده در داده بزرگ است.

کلیدواژه‌ها


عنوان مقاله [English]

Improvement of the Analytical Queries Response Time in Real-Time Data Warehouse using Materialized Views Concatenation

نویسندگان [English]

  • Seyed Majid Shafaei
  • Babak Vaziri
  • Seyed Moostafa Shafaei
چکیده [English]

A real-time data warehouse is a collection of recent and hierarchical data that is used for managers’ decision-making by creating online analytical queries. The volume of data collected from data sources and entered into the real-time data warehouse is constantly increasing. Moreover, as the volume of input data to the real time data warehouse increases, the interference between online loading operations and online analytical processing increases. These two stated challenges have become the most important issues regarding real time data warehouse. In this article, a method is presented to improve the analytical queries response time in the real time data warehouse architecture using materialized views concatenation. This process takes place by: (1) storing the results of performed queries in each real time section, (2) transferring the results to the next section when transferring data to the section. Each real time section contains data of its previous section, which have been transferred in several stages. As a result, the calculated results of the queries are also transmitted by transferring data, and consequently, for achieving desired outcome, the previously calculated results can be combined without the need to run the queries again. The proposed method has reduced both analytical queries response time and data entry interference caused by the simultaneous and long-term execution of queries. This study faces two challenges: (1) applying the proposed method to a small amount of data in the real time section and (2) the changes in the proposed method for applying to big data.

کلیدواژه‌ها [English]

  • Real-time data warehouse
  • Online Analytical processing
  • Partitioning
  • Materialized view
  • Data storage
  1. [1] R. J. Santos and J. Bernardino, "Real-time data warehouse loading methodology", Proceedings of the 2008 international symposium on Database engineering & applications, ACM, pp. 49-58, 2008. [2] Y. Zhu, L. An and S. Liu, "Data updating and query in real-time data warehouse system", Computer science and software engineering, 2008 international conference on, IEEE, pp. 1295-1297, 2008. [3] W. Qu, V. Basavaraj, S. Shankar and S. Dessloch, "Real-Time Snapshot Maintenance with Incremental ETL Pipelines in Data Warehouses", International Conference on Big Data Analytics and Knowledge Discovery, Springer, pp. 217-228, 2015. [4] Y. Mao, W. Min, J. Wang, B. Jia and Q. Jie, "Dynamic mirror based real-time query contention solution for support big real-time data analysis", Information Technology and Electronic Commerce (ICITEC), 2014 2nd International Conference on, IEEE, pp. 229-233, 2014. [5] I. Hamdi, E. Bouazizi and J. Feki, "Dynamic management of materialized views in real-time data warehouses", Soft Computing and Pattern Recognition (SoCPaR), 2014 6th International Conference of, IEEE, pp. 168-173, 2014. [6] Z. Lin, Y. Lai, C. Lin, Y. Xie and Q. Zou, "Maintaining internal consistency of report for real-time OLAP with layer-based view", Asia-Pacific Web Conference, Springer, pp. 143-154, 2011. [7] W. He and L. Cui, "A Parallel Approach for Real-Time OLAP Based on Node Performance Awareness", High Performance Computing, Springer, pp. 75-88, 2013. [8] M. Obali, B. Dursun, Z. Erdem and A. K. Görur, "A real time data warehouse approach for data processing", Signal Processing and Communications Applications Conference (SIU), 2013 21st, IEEE, pp. 1-4, 2013. [9] R. J. Santos, J. Bernardino and M. Vieira, "24/7 Real-Time Data Warehousing: A Tool for Continuous Actionable Knowledge", 2011 IEEE 35th Annual Computer Software and Applications Conference, IEEE, pp. 279-288, 2011. [10] M. A. Naeem, G. Dobbie and G. Webber, "An event-based near real-time data integration architecture", 2008 12th Enterprise Distributed Object Computing Conference Workshops, IEEE, pp. 401-404, 2008. [11] S. Yichuan and X. Yao, "Research of Real-time Data Warehouse Storage Strategy Based on Multi-level Caches", Physics Procedia, pp. 2315-2321, 2012. [12] J. Zuters, "Near real-time data warehousing with multi-stage trickle and flip", International Conference on Business Informatics Research, Springer, pp. 73-82, 2011. [13] T. Jain, "Refreshing datawarehouse in near real-time", International Journal of Computer Applications, vol. 46, 2012. [14] R. Jia, S. Xu and C. Peng, "Research on Real Time Data Warehouse Architecture", International Conference on Information Computing and Applications, Springer, pp. 333-342, 2013. [15] R. Kimball and J. Caserta, "The Data WarehouseETL Toolkit: Practical Techniques for Extracting, Cleaning, Conforming, and Delivering Data", John Wiley & Sons, 2011. [16] D. A. Chappell, "Enterprise Service Bus", O'Reilly Media, 2004. [17] N. Ferreira, P. Martins and P. Furtado, "Near real-time with traditional data warehouse architectures: factors and how-to", Proceedings of the 17th International Database Engineering & Applications Symposium, ACM, pp. 68-75, 2013. [18] A. Cuzzocrea, N. Ferreira and P. Furtado, "Enhancing Traditional Data Warehousing Architectures with Real-Time Capabilities", International Symposium on Methodologies for Intelligent Systems, Springer, pp. 456-465, 2014. [19] M. N. Tho and A. M. Tjoa, "Zero-latency data warehousing for heterogeneous data sources and continuous data streams", 5th International Conference on Information Integrationand Web-based Applications Services, pp. 55-64, 2003. [20] H. Zhou, D. Yang and Y. Xu, "An ETL strategy for real-time data warehouse", Practical applications of intelligent systems, Springer, pp. 329-336, 2011. [21] N. Ferreira and P. Furtado, "Real-time data warehouse: a solution and evaluation", International Journal of Business Intelligence and Data Mining, vol. 8, pp. 244-263, 2013. [22] R. Abrahiem, "A new generation of middleware solutions for a near-real-time data warehousing architecture", 2007 IEEE International Conference on Electro/Information Technology, IEEE, pp. 192-197, 2007. [23] L. Golab and T. Johnson, "Data stream warehousing", 2014 IEEE 30th International Conference on Data Engineering, IEEE, pp. 1290-1293, 2014. [24] M. Gorawski and A. Gorawska, "Research on the stream ETL process", International Conference: Beyond Databases, Architectures and Structures, Springer, pp. 61-71, 2014. [25] شفائی، سید مصطفی، دانش پور، نگین و شفائی، سید مجید، «استفاده از قابلیت‌های XML و دیدهای ذخیره‌شده در ایجاد یک معماری پایگاه داده تحلیلی تقریباً بی‌درنگ»، نشریه مهندسی برق و مهندسی کامپیوتر ایران، سال پانزدهم، شماره 1-ب، ص 14-26، 1396. [26] S. M. Shafaei, B. Vaziri, "Improving the Parallel Queries Response Time of Traditional and Real-Time Data Warehouse Architecture", 1st Business Intelligence Conference, Tehran, 2016. [27] S. M. Shafaei, N. Daneshpour, S. M. Shafaei, "Using the Capabilities of XML and Materialized Views in Creating a Near Real-Time Data Warehouse", Iranian Journal of Electrical and Computer Engineering, Volume 15, Number 1, pp.85-101, 2017. [28] S. M. Shafaei, N. Daneshpour, S. M. Shafaei, "A Near Real-Time Data Warehouse Architecture Based on Ontology", Iranian Journal of Electrical and Computer Engineering, Volume 15, Number 2, pp.85-101, 2017. [29] R. J. Santos and J. Bernardino, "A Query Cache Tool for Optimizing Repeatable and Parallel OLAP Queries", International Conference on Database and Expert Systems Applications, Springer, pp. 143-152, 2009. [30] P. O’neil, E. O’neil, X. Chen and S. Revilak, "The star schema benchmark and augmented fact table indexing", Technology Conference on Performance Evaluation and Benchmarking, Springer, pp. 237-252, 2009. [31] T. P. P. COUNCIL, "TPC-H benchmark specification", Published at http://www.tcp.org/hspec.html, 2008. [32] A. R. Ali, "Real-time big data warehousing and analysis framework," 2018 IEEE 3rd International Conference on Big Data Analysis (ICBDA), Shanghai, pp. 43-49, 2018. [33] A. Cuzzocrea and R. Moussa, "Towards Lambda-Based Near Real-Time OLAP over Big Data," 2018 IEEE 42nd Annual Computer Software and Applications Conference (COMPSAC), Tokyo, pp. 438-441, 2018. [34] A. Wibowo and S. Akbar, "Handling of internal inconsistency OLAP - Based lock table using Message Oriented Middleware in near real time data warehousing," 2015 International Seminar on Intelligent Technology and Its Applications (ISITIA), Surabaya, pp. 329-334, 2015, [35] N. Fikri, M. Rida, N. Abghour and et al. "An adaptive and real-time based architecture for financial data integration". J Big Data 6, 97 ,2019. [36] G.V. Machado, Í. Cunha, A.C.M. Pereira and et al. "DOD-ETL: distributed on-demand ETL for near real-time business intelligence". J Internet Serv A, pp. 10, 21 ,2019. [37] Mohammed Muddasir N, Raghuveer K. "Study of methods to achieve near real time ETL", In: 2017 international conference on current trends in computer, electrical, electronics and communication (CTCEEC), Mysore, India. pp. 436–41, 2017. [38] Shi J, Bao Y, Leng F, Yu G. "Study on log-based change data capture and handling mechanism in real-time data warehouse." In: Computer Science and Software Engineering, 2008 International Conference On. New York: IEEE, pp. 478–81,2008. [39] G. Garani, A. Chernov, I. Savvas and M. Butakova, "A Data Warehouse Approach for Business Intelligence," 2019 IEEE 28th International Conference on Enabling Technologies: Infrastructure for Collaborative Enterprises (WETICE), Napoli, Italy, 2019, pp. 70-75.2019. [40] N. A. Farooqui and R. Mehra, "Design of A Data Warehouse for Medical Information System Using Data Mining Techniques," 2018 Fifth International Conference on Parallel, Distributed and Grid Computing (PDGC), Solan Himachal Pradesh, India, pp, 2018.