A new approach to accelerate policy iteration for probabilistic model checking of Markov decision processes using machine learning

Document Type : Original Article

Author

Department of Computer science, Faculty of Mathematical science, Vali-e-Asr university of Rafsanjan, Rafsanjan, Iran

Abstract

Markov decision processes (MDPs) are used in artificial intelligence and formal verification to model computer systems with nondeterministic and stochastic behaviors. Optimal reachability probabilities and expected accumulated rewards are two main classes of properties that are used in probabilistic model checking. Value iteration and policy iteration are two well-known iterative numerical methods to approximate the optimal values. A main challenge of these approaches is their high running time. In this paper, a new method is proposed for accelerating the convergence to the optimal policy. This method is based on using machine learning to estimate optimal policies. For each class of MDP models, we consider several small models for the training and construction step of the classifier model. The classifier model is used to predict the optimal actions of each MDP model and for suggesting a near optimal policy for large models of the same MDP class. Implementing the proposed method in the PRISM model checker shows a 50% improvement in the average runtime.

Keywords

Main Subjects



Articles in Press, Accepted Manuscript
Available Online from 22 August 2023
  • Receive Date: 16 December 2021
  • Revise Date: 10 April 2023
  • Accept Date: 24 June 2023