An approach to accelerate policy iteration for probabilistic model checking of Markov decision processes using machine learning

Document Type : Original Article

Author

Department of Computer science, Faculty of Mathematical science, Vali-e-Asr university of Rafsanjan, Rafsanjan, Iran

Abstract

Markov decision processes (MDPs) are used in artificial intelligence and formal verification to model computer systems with nondeterministic and stochastic behaviors. Optimal reachability probabilities and expected accumulated rewards are two main classes of properties that are used in probabilistic model checking. Value iteration and policy iteration are two well-known iterative numerical methods to approximate the optimal values. A main challenge of these approaches is their high running time. In this paper, a new method is proposed for accelerating the convergence to the optimal policy that reduces the running time of the policy iteration method. This method is based on using machine learning to estimate near-optimal policies. For each class of MDP models, we consider several small models for the training and construction step of the classifier model. The classifier model is used to predict the optimal actions of each MDP model. The model is also used to suggest a near-optimal policy for large models of the same MDP class in order to reduce the overall running time. Implementing the proposed method in the PRISM model checker shows a 50% improvement in the average runtime.

Keywords

Main Subjects


[1] C. Baier, J.P. Katoen, Principles of model checking, MIT Press, ISBN 978-0-262-02649-9, pp. 1-975, 2008.
[2] V. Forejt, M. Kwiatkowska, G. Norman, and D. Parker, “Automated Verification Techniques for Probabilistic Systems,” in Formal Methods for Eternal Networked Software Systems, vol. 6659, Springer, Berlin, Heidelberg, 2011, doi: 10.1007/978-3-642-21455-4_3.
[3] J.-P. Katoen, “The Probabilistic Model Checking Landscape,” in 31st Ann. ACM/IEEE Symp. Logic Comput. Sci. (LICS), New York, NY, USA, 2016, pp. 1-15.
[4] C. Hensel, S. Junges, J.P. Katoen, T. Quatmann, and M. Volk, “The probabilistic model checker Storm,” Int. J. Softw. Tools Technol. Transf, vol. 24, no. 4, pp. 589-610, 2022, doi: 10.1007/s10009-021-00633-z.
[5] M. Kwiatkowska, G. Norman, and D. Parker, “PRISM 4.0: Verification of Probabilistic Real-Time Systems,” in Computer Aided Verification, vol. 6806, Springer, Berlin, Heidelberg, 2011, doi: 10.1007/978-3-642-22110-1_47.
[6] T.P. Gros, H. Hermanns, J. Hoffmann, M. Klauck, and M. Steinmetz, “Deep Statistical Model Checking,” in Formal Tech. Distributed Obj. Comp. Syst. (FORTE), vol. 12136, Springer, Cham., 2020, doi: 10.1007/978-3-030-50086-3_6.
[7] C. Baier, J. Klein, L. Leuschner, D. Parker, and S. Wunderlich, “Ensuring the Reliability of Your Model Checker: Interval Iteration for Markov Decision Processes,” in Comput. Aided Verif. (CAV), vol. 10426, Springer, Cham., 2017, doi: 10.1007/978-3-319-63387-9_8.
[8] C. Baier, P.R. D’Argenio, and H. Hermanns, “On the probabilistic bisimulation spectrum with silent moves,” Acta Informatica, vol. 57, no. 3-5, pp. 465-512, 2020, doi: 10.1007/s00236-020-00379-2.
[9] T. Brazdil, C. Krishnendu, C. Martin, F. Vojtech, J. Kret?nsky, M. Kwiatkowska, D. Parker, and M. Ujma, “Verification of Markov decision processes using learning algorithms,” in 12th Int. Symp. Autom. Technol. Verif. Anal. (ATVA), Sydney, NSW, Australia, 2014, pp. 98-114, doi: 10.1007/978-3-319-11936-6_8.
[10] L. Gui, J. Sun, S. Song, Y. Liu, and J.S. Dong, “SCC-Based Improved Reachability Analysis for Markov Decision Processes,” in Formal Methods Softw. Eng. (ICFEM), vol. 8829, Springer, Cham., 2014, doi: 10.1007/978-3-319-11737-9_12.
[11] S. Najafi and S. Noferesti, “Determining dynamic time quantum in round-robin scheduling algorithm using machine learning,” Soft Comput. J., vol. 10, no. 2, pp. 32-43, 2022, doi: 10.22052/scj.2022.243181.1002 [In Persian].
[12] H. Veisi, H.R. Ghaedsharaf, and M. Ebrahimi, “Improving the Performance of Machine Learning Algorithms for Heart Disease Diagnosis by Optimizing Data and Features,” Soft Comput. J., vol. 8, no. 1, pp. 70-85, 2019, doi: 10.22052/8.1.70 [In Persian].
[13] R. Bouchekir and M.C. Boukala, “Toward Implicit Learning for the Compositional Verification of Markov Decision Processes,” in Verif. Eval. Comput. Commun. Syst. (VECoS), vol. 11181, Springer, Cham., 2018, doi: 10.1007/978-3-030-00359-3_13.
[14] S. Oleksandr and C.-G. Lee, “Accelerated modified policy iteration algorithms for Markov decision processes,” Math. Methods Oper. Res., vol. 78, no. 1, pp. 61-76, 2013, doi: 10.1007/s00186-013-0432-y.
[15] A. Rataj and B. Wona-Szczeniak, “Extrapolation of an Optimal Policy using Statistical Probabilistic Model Checking,” Fundam. Informaticae, vol. 157, no. 4, pp. 443-461, 2018, doi: 10.3233/FI-2018-1637.
[16] M. Mohagheghi and K. Salehi, “Machine Learning and Disk-based Methods for Qualitative Verification of Markov Decision Processes,” in Proc. 16th Int. Conf. Edu. Res. Ind. Appl. Integr. Harmon. Knowl. Transfer, vol. II, Kharkiv, Ukraine, 2020, pp. 74-88.
[17] M. Mohagheghi, J. Karimpour, and A. Isazadeh, “Improving Modified Policy Iteration for Probabilistic Model Checking,” Comput. Sci., vol. 23, no. 1, 2022, doi: 10.7494/csci.2022.23.1.4139.
[18] M. Mohagheghi, J. Karimpour, and A. Isazadeh, “Prioritizing methods to accelerate probabilistic model checking of discrete-time Markov models,” Comput. J., vol. 63, no. 1, pp. 105-122, 2020, doi: 10.1093/comjnl/bxz001.
[19] J. Karimpour, A. Isazadeh, M. Mohagheghi, and K. Salehi, “Improved Iterative Methods for Verifying Markov Decision Processes,” in Fundam. Softw. Eng. (FSEN), vol. 9392, Springer, Cham., 2015, doi: 10.1007/978-3-319-24644-4_14.
[20] M. Kwiatkowska, D. Parker, and H. Qu, “Incremental quantitative verification for Markov decision processes,” in IEEE/IFIP 41st Int. Conf. Dependable Syst. Networks (DSN), Hong Kong, China, 2011, pp. 359-370, doi: 10.1109/DSN.2011.5958249.
[21] A. Hartmanns, M. Klauck, D. Parker, T. Quatmann, and E. Ruijters, “The Quantitative Verification Benchmark Set,” in Tools Algorithms Constr. Anal. Syst. (TACAS), vol. 11427, Springer, Cham., 2019, doi: 10.1007/978-3-030-17462-0_20.
[22] F. Ciesinski, C. Baier, M. Grober, and J. Klein, “Reduction Techniques for Model Checking Markov Decision Processes,” in 5th Int. Conf. Quant. Eval. Syst., Saint-Malo, France, 2008, pp. 45-54, doi: 10.1109/QEST.2008.45.
[23] T. Salehi, M. Maadani, and M, Mahdavi, “Delay reduction based on Markov decision process in inter-vehicle wireless networks,” in 4th Int. Conf. Modern Stud. Comput. Sci. Inf. Technol., Mashhad, Iran, 2017 [In Persian].
[24] R. Sadeghian, “Determining the equilibrium solution in two-player dynamic discrete markovian games with transition probabilities influenced by competitor strategies,” Soft Comput. J., vol. 11, no. 1, pp. 48-59, 2022, doi: 10.22052/scj.2022.242848.0 [In Persian].