Markov decision processes (MDPs) are used in artificial intelligence and formal verification to model computer systems with nondeterministic and stochastic behaviors. Optimal reachability probabilities and expected accumulated rewards are two main classes of properties that are used in probabilistic model checking. Value iteration and policy iteration are two well-known iterative numerical methods to approximate the optimal values. A main challenge of these approaches is their high running time. In this paper, a new method is proposed for accelerating the convergence to the optimal policy. This method is based on using machine learning to estimate optimal policies. For each class of MDP models, we consider several small models for the training and construction step of the classifier model. The classifier model is used to predict the optimal actions of each MDP model and for suggesting a near optimal policy for large models of the same MDP class. Implementing the proposed method in the PRISM model checker shows a 50% improvement in the average runtime.
Mohagheghi, M. (2023). A new approach to accelerate policy iteration for probabilistic model checking of Markov decision processes using machine learning. Soft Computing Journal, (), -. doi: 10.22052/scj.2023.243360.1029
MLA
Mohammadsadegh Mohagheghi. "A new approach to accelerate policy iteration for probabilistic model checking of Markov decision processes using machine learning". Soft Computing Journal, , , 2023, -. doi: 10.22052/scj.2023.243360.1029
HARVARD
Mohagheghi, M. (2023). 'A new approach to accelerate policy iteration for probabilistic model checking of Markov decision processes using machine learning', Soft Computing Journal, (), pp. -. doi: 10.22052/scj.2023.243360.1029
VANCOUVER
Mohagheghi, M. A new approach to accelerate policy iteration for probabilistic model checking of Markov decision processes using machine learning. Soft Computing Journal, 2023; (): -. doi: 10.22052/scj.2023.243360.1029