نوع مقاله : مقاله پژوهشی
نویسنده
گروه علوم کامپیوتر، دانشکده علوم ریاضی، دانشگاه ولیعصر رفسنجان، رفسنجان، ایران
چکیده
کلیدواژهها
موضوعات
عنوان مقاله [English]
نویسنده [English]
Markov decision processes (MDPs) are used in artificial intelligence and formal verification to model computer systems with nondeterministic and stochastic behaviors. Optimal reachability probabilities and expected accumulated rewards are two main classes of properties that are used in probabilistic model checking. Value iteration and policy iteration are two well-known iterative numerical methods to approximate the optimal values. A main challenge of these approaches is their high running time. In this paper, a new method is proposed for accelerating the convergence to the optimal policy. This method is based on using machine learning to estimate optimal policies. For each class of MDP models, we consider several small models for the training and construction step of the classifier model. The classifier model is used to predict the optimal actions of each MDP model and for suggesting a near optimal policy for large models of the same MDP class. Implementing the proposed method in the PRISM model checker shows a 50% improvement in the average runtime.
کلیدواژهها [English]