ارائه الگوریتمی جهت تسریع روش تکرار سیاست در راستی‌آزمایی فرآیندهای تصمیم مارکوف با استفاده از یادگیری ماشین

محققی, محمدصادق

doi:10.22052/scj.2023.243360.1029

ارائه الگوریتمی جهت تسریع روش تکرار سیاست در راستی‌آزمایی فرآیندهای تصمیم مارکوف با استفاده از یادگیری ماشین

نوع مقاله : مقاله پژوهشی

نویسنده

محمدصادق محققی

گروه علوم کامپیوتر، دانشکده علوم ریاضی، دانشگاه ولیعصر رفسنجان، رفسنجان، ایران

10.22052/scj.2023.243360.1029

چکیده

فرآیندهای تصمیم مارکوف در هوش مصنوعی و راستی‌آزمایی رسمی برای مدل‌سازی سیستم‌های کامپیوتری که دارای رفتارهای تصادفی و غیرقطعی هستند، استفاده می‌شوند. دو دسته مهم از ویژگی‌هایی که در وارسی مدل احتمالاتی استفاده می‌شوند شامل احتمال بهینه رسیدن به حالت هدف و پاداش انباشته شده مورد انتظار هستند. تکرار مقدار و تکرار سیاست دو روش عددی تکراری شناخته شده برای تقریب مقادیر بهینه هستند. چالش اصلی این روش‌ها زمان اجرایی بالای آنها است. در این مقاله روشی جدید برای تسریع همگرایی به سیاست بهینه ارائه می‌شود که زمان اجرایی روش تکرار سیاست را کاهش می‌دهد. این روش بر پایه استفاده از یادگیری ماشین برای تخمین یک سیاست نزدیک به بهینه است. برای هر کلاس از مدل‌های فرآیند تصمیم مارکوف، تعدادی مدل کوچک را برای مرحله آموزش و ساخت دسته‌بند در نظر می‌گیریم. دسته‌بند ساخته شده در فرآیند یادگیری، برای پیش‌بینی کنش بهینه هر حالت فرآیند تصمیم مارکوف داده شده به کار می‌رود. این دسته‌بند همچنین برای پیشنهاد یک سیاست نزدیک به بهینه برای فرآیندهای تصمیم مارکوف بزرگ از همان دسته مدل‌ها، استفاده می‌شود تا زمان مصرفی کل را کاهش دهد. پیاده‌سازی روش ارائه شده در وارسی‌گر مدل PRISM نشان می‌دهد زمان اجرا به طور میانگین ۵۰ درصد کاهش می‌یابد.

کلیدواژه‌ها

موضوعات

وارسی مدل

عنوان مقاله [English]

An approach to accelerate policy iteration for probabilistic model checking of Markov decision processes using machine learning

نویسنده [English]

Mohammadsadegh Mohagheghi

Department of Computer science, Faculty of Mathematical science, Vali-e-Asr university of Rafsanjan, Rafsanjan, Iran

چکیده [English]

Markov decision processes (MDPs) are used in artificial intelligence and formal verification to model computer systems with nondeterministic and stochastic behaviors. Optimal reachability probabilities and expected accumulated rewards are two main classes of properties that are used in probabilistic model checking. Value iteration and policy iteration are two well-known iterative numerical methods to approximate the optimal values. A main challenge of these approaches is their high running time. In this paper, a new method is proposed for accelerating the convergence to the optimal policy that reduces the running time of the policy iteration method. This method is based on using machine learning to estimate near-optimal policies. For each class of MDP models, we consider several small models for the training and construction step of the classifier model. The classifier model is used to predict the optimal actions of each MDP model. The model is also used to suggest a near-optimal policy for large models of the same MDP class in order to reduce the overall running time. Implementing the proposed method in the PRISM model checker shows a 50% improvement in the average runtime.

کلیدواژه‌ها [English]

Formal verification
Probabilistic model checking
Markov decision processes
Policy iteration
Optimal reachability
Machine learning

مراجع

[1] C. Baier, J.P. Katoen, Principles of model checking, MIT Press, ISBN 978-0-262-02649-9, pp. 1-975, 2008.
[2] V. Forejt, M. Kwiatkowska, G. Norman, and D. Parker, “Automated Verification Techniques for Probabilistic Systems,” in Formal Methods for Eternal Networked Software Systems, vol. 6659, Springer, Berlin, Heidelberg, 2011, doi: 10.1007/978-3-642-21455-4_3.
[3] J.-P. Katoen, “The Probabilistic Model Checking Landscape,” in 31st Ann. ACM/IEEE Symp. Logic Comput. Sci. (LICS), New York, NY, USA, 2016, pp. 1-15.
[4] C. Hensel, S. Junges, J.P. Katoen, T. Quatmann, and M. Volk, “The probabilistic model checker Storm,” Int. J. Softw. Tools Technol. Transf, vol. 24, no. 4, pp. 589-610, 2022, doi: 10.1007/s10009-021-00633-z.
[5] M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of Probabilistic Real-Time Systems,” in Computer Aided Verification, vol. 6806, Springer, Berlin, Heidelberg, 2011, doi: 10.1007/978-3-642-22110-1_47.
[6] T.P. Gros, H. Hermanns, J. Hoffmann, M. Klauck, and M. Steinmetz, “Deep Statistical Model Checking,” in Formal Tech. Distributed Obj. Comp. Syst. (FORTE), vol. 12136, Springer, Cham., 2020, doi: 10.1007/978-3-030-50086-3_6.
[7] C. Baier, J. Klein, L. Leuschner, D. Parker, and S. Wunderlich, “Ensuring the Reliability of Your Model Checker: Interval Iteration for Markov Decision Processes,” in Comput. Aided Verif. (CAV), vol. 10426, Springer, Cham., 2017, doi: 10.1007/978-3-319-63387-9_8.
[8] C. Baier, P.R. D’Argenio, and H. Hermanns, “On the probabilistic bisimulation spectrum with silent moves,” Acta Informatica, vol. 57, no. 3-5, pp. 465-512, 2020, doi: 10.1007/s00236-020-00379-2.
[9] T. Brazdil, C. Krishnendu, C. Martin, F. Vojtech, J. Kret?nsky, M. Kwiatkowska, D. Parker, and M. Ujma, “Verification of Markov decision processes using learning algorithms,” in 12th Int. Symp. Autom. Technol. Verif. Anal. (ATVA), Sydney, NSW, Australia, 2014, pp. 98-114, doi: 10.1007/978-3-319-11936-6_8.
[10] L. Gui, J. Sun, S. Song, Y. Liu, and J.S. Dong, “SCC-Based Improved Reachability Analysis for Markov Decision Processes,” in Formal Methods Softw. Eng. (ICFEM), vol. 8829, Springer, Cham., 2014, doi: 10.1007/978-3-319-11737-9_12.
[11] S. Najafi and S. Noferesti, “Determining dynamic time quantum in round-robin scheduling algorithm using machine learning,” Soft Comput. J., vol. 10, no. 2, pp. 32-43, 2022, doi: 10.22052/scj.2022.243181.1002 [In Persian].
[12] H. Veisi, H.R. Ghaedsharaf, and M. Ebrahimi, “Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features,” Soft Comput. J., vol. 8, no. 1, pp. 70-85, 2019, doi: 10.22052/8.1.70 [In Persian].
[13] R. Bouchekir and M.C. Boukala, “Toward Implicit Learning for the Compositional Verification of Markov Decision Processes,” in Verif. Eval. Comput. Commun. Syst. (VECoS), vol. 11181, Springer, Cham., 2018, doi: 10.1007/978-3-030-00359-3_13.
[14] S. Oleksandr and C.-G. Lee, “Accelerated modified policy iteration algorithms for Markov decision processes,” Math. Methods Oper. Res., vol. 78, no. 1, pp. 61-76, 2013, doi: 10.1007/s00186-013-0432-y.
[15] A. Rataj and B. Wona-Szczeniak, “Extrapolation of an Optimal Policy using Statistical Probabilistic Model Checking,” Fundam. Informaticae, vol. 157, no. 4, pp. 443-461, 2018, doi: 10.3233/FI-2018-1637.
[16] M. Mohagheghi and K. Salehi, “Machine Learning and Disk-based Methods for Qualitative Verification of Markov Decision Processes,” in Proc. 16th Int. Conf. Edu. Res. Ind. Appl. Integr. Harmon. Knowl. Transfer, vol. II, Kharkiv, Ukraine, 2020, pp. 74-88.
[17] M. Mohagheghi, J. Karimpour, and A. Isazadeh, “Improving Modified Policy Iteration for Probabilistic Model Checking,” Comput. Sci., vol. 23, no. 1, 2022, doi: 10.7494/csci.2022.23.1.4139.
[18] M. Mohagheghi, J. Karimpour, and A. Isazadeh, “Prioritizing methods to accelerate probabilistic model checking of discrete-time Markov models,” Comput. J., vol. 63, no. 1, pp. 105-122, 2020, doi: 10.1093/comjnl/bxz001.
[19] J. Karimpour, A. Isazadeh, M. Mohagheghi, and K. Salehi, “Improved Iterative Methods for Verifying Markov Decision Processes,” in Fundam. Softw. Eng. (FSEN), vol. 9392, Springer, Cham., 2015, doi: 10.1007/978-3-319-24644-4_14.
[20] M. Kwiatkowska, D. Parker, and H. Qu, “Incremental quantitative verification for Markov decision processes,” in IEEE/IFIP 41st Int. Conf. Dependable Syst. Networks (DSN), Hong Kong, China, 2011, pp. 359-370, doi: 10.1109/DSN.2011.5958249.
[21] A. Hartmanns, M. Klauck, D. Parker, T. Quatmann, and E. Ruijters, “The Quantitative Verification Benchmark Set,” in Tools Algorithms Constr. Anal. Syst. (TACAS), vol. 11427, Springer, Cham., 2019, doi: 10.1007/978-3-030-17462-0_20.
[22] F. Ciesinski, C. Baier, M. Grober, and J. Klein, “Reduction Techniques for Model Checking Markov Decision Processes,” in 5th Int. Conf. Quant. Eval. Syst., Saint-Malo, France, 2008, pp. 45-54, doi: 10.1109/QEST.2008.45.
[23] T. Salehi, M. Maadani, and M, Mahdavi, “Delay reduction based on Markov decision process in inter-vehicle wireless networks,” in 4th Int. Conf. Modern Stud. Comput. Sci. Inf. Technol., Mashhad, Iran, 2017 [In Persian].
[24] R. Sadeghian, “Determining the equilibrium solution in two-player dynamic discrete markovian games with transition probabilities influenced by competitor strategies,” Soft Comput. J., vol. 11, no. 1, pp. 48-59, 2022, doi: 10.22052/scj.2022.242848.0 [In Persian].