Determining the equilibrium solution in two-player dynamic discrete markovian games with transition probabilities influenced by competitor strategies

Document Type : Original Article

Author

Department of Industrial Engineering, Payam Noor University, Tehran, Iran

Abstract

This paper focuses on a specific type of game called Markovian dynamic game. In these games, the strategy of each player is considered as a state of a Markov chain. At each stage, players may choose the same strategy or a different one with a certain probability based on their situation and position. However, these choices depend on the strategies of the competing players as well. This paper examines a two-player discrete Markov game with predetermined and independent transition probabilities, influenced only by the strategies of the competing player, and discusses how equilibrium points are determined in Markovian dynamic game through a numerical example.

Keywords


[1] M.J. Asgharpour, Group Decision Making and Game Thory: an Approach on Operations Research, Samt Press, Tehran University, Tehran, Iran, 2003, [In Persian]. 
[2] J.B. Krawczyk and V. Petkov, “Multistage Games,” In: Handbook of Dynamic Game Theory, T. Basar and G. Zaccour, Springer, Cham, 2018, doi: 10.1007/978-3-319-44374-4_3.
[3] B.R. Myerson, “Multistage Games with Communication”, Econometrica, vol. 54, no. 2, pp. 323-358, 1986, doi: 10.2307/1913154.
[4] L.M. Littman, “Value-function reinforcement learning in Markov games”, Cogn. Syst. Res., vol. 2, no. 1, pp. 55-66, 2001, doi: 10.1016/S1389-0417(01)00015-8.
[5] P. Vrancx, “Decentralised Reinforcement Learning in Markov Games”, Ph.D. dissertation, Brussel University, Brussels, Belgium, 2010.
[6] J.M. Osborne, An introduction to Game Theory, Oxford University Press, Oxford, UK, 2000.
[7] M. Finus, Game theory and international environmental cooperation, Edward Elgar Press, Massachusetts, USA, 2001.
[8] K. Mollering, Inventory Rationing: A New Modeling Approach Using Markov Chain Theory, Springer Press, Koln, Germany, 2019.
[9] W.R. Gilks, S. Richardson, and D.J. Spiegelhalter, Markov Chain Monte Carlo in Practice, Chapman & Hall Press, London, UK, 1996.
[10] S. Balaji, E.G. Julie, Y.H. Robinson, R. Kumar, P.H. Thong, and L.H. Son, “Design of a security-aware routing schemein Mobile Ad-hoc Network using repeated game model,” Comput. Stand. Interfaces, vol. 66, 2019, doi: 10.1016/j.csi.2019.103358.
[11] Y. Jie, X. Tang, K.-K. R. Choo, S. Su, M. Li, and C. Guo, “Online task scheduling for edge computing based on repeated Stackelberg game,” J. Parallel Distributed Comput., vol. 122, pp. 159-172, 2018, doi: 10.1016/j.jpdc.2018.07.019.
[12] N.T. Cason and V.-L. Mui, “Individual versus group choices of repeated game strategies: A strategy method approach,” Games Econ. Behav., vol. 114, pp. 128-145, 2019, doi: 10.1016/j.geb.2019.01.003.
[13] G. Ashkenazi-Golan and E. Lehrer, “Blackwell's comparison of experiments and discounted repeated games,” Games Econ. Behav., vol. 117, pp. 163-194, 2019, doi: 10.1016/j.geb.2019.06.003.
[14] E. Ianovski and L. Ong, “The complexity of decision problems about equilibria in two-player Boolean games,” Artif. Intell., vol. 261, pp. 1-15, 2018, doi: 10.1016/j.artint.2018.04.006.
[15] F. Gensbittel and C. Rainer, “A Two-Player Zero-sum Game Where Only One Player Observes a Brownian Motion,” Dyn. Games Appl., vol. 8, no. 2, pp. 280-314, 2018, doi: 10.1007/s13235-017-0219-5}.
[16] V.O. Baskov, “Equilibrium payoffs in repeated two-player zero-sum games of finite automata,” Int. J. Game Theory, vol. 48, no. 2, pp. 423-431, 2019, doi: 10.1007/s00182-018-0634-x.
[17] Z. Wang, Q. Wei, and D. Liu, “Event-triggered adaptive dynamic programming for discrete-time multi-player games,” Inf. Sci., vol. 506, pp. 457-470, 2020, doi: 10.1016/j.ins.2019.05.071.
[18] R. Song and L. Zhu, “Stable value iteration for two-player zero-sum game of discrete-time nonlinear systems based on adaptive dynamic programming,” Neurocomputing, vol. 340, pp. 180-195, 2019, doi: 10.1016/j.neucom.2019.03.002.
[19] P.M. Abraham and A.A. Kulkarni, “An Approach Based on Generalized Nash Games and Shared Constraints for Discrete Time Dynamic Games,” Dyn. Games Appl., vol. 8, no. 4, pp. 641–670, 2018, doi: 10.1007/s13235-017-0231-9.
[20] A.Yadollahi, J. Salimi-Sartaghti, and S. Goli-Bidgoli, “Modeling the security of virtual machines in the cloud using iterative game theory,” Soft Comput. J., vol. 10, no. 1, pp. 2-15, 2021, doi: 10.22052/scj.2021.242842.0 [In Persian].
[21] B.A. Eisenbruch, L.R. Grillot, D. Maestripieri, and R.J. Roney, “Evidence of partner choice heuristics in a one-shot bargaining game,” Evol. Hum. Behav., vol. 37, no. 6, pp. 429-439, 2016, doi: 10.1016/j.evolhumbehav.2016.04.002.
[22] R.A. Laird, “Sequential interactions – in which one player plays first and another responds – promote cooperation in evolutionary-dynamical simulations of single-shot Prisoner's Dilemma and Snowdrift games,” J. Theor. Biol., vol. 452, pp. 69-80, 2018, doi: 10.1016/j.jtbi.2018.05.007.
[23] G. Charness, L. Rigotti, and A. Rustichini, “Social surplus determines cooperation rates in the one-shot Prisoner's Dilemma,” Games Econ. Behav., vol. 100, pp. 113-124, 2016, doi: 10.1016/j.geb.2016.08.010.
[24] J.-L. Tan, C. Lei, H. Zhang, and Y.-q. Cheng, “Optimal strategy selection approach to moving target defense based on Markov robust game,” Comput. Secur., vol. 85, pp. 63-76, 2019, doi: 10.1016/j.cose.2019.04.013.
[25] S.E. Albarran and J.B. Clempner, “A Stackelberg security Markov game based on partial information for strategic decision making against unexpected attacks,” Eng. Appl. Artif. Intell., vol. 81, pp. 408-419, 2019, doi: 10.1016/j.engappai.2019.03.010.
[26] R. Sadeghian, “Determining the Equilibrium Solution in Two-Player Static Discrete Markovian Games,” Mod. Res. Decis. Mak., vol. 5, no. 4, pp. 85-99, 2021 [In Persian]. 
[27] G. Wu, G. Tan, J. Deng, and D. Jiang, “Distributed reinforcement learning algorithm of operator service slice competition prediction based on zero-sum markov game,” Neurocomputing, vol. 439, pp. 212-222, 2021, doi: 10.1016/j.neucom.2021.01.061.
[28] Y. Zhao, L. Huang, C. Smidts, and Q. Zhu, “Finite-horizon semi-Markov game for time-sensitive attack response and probabilistic risk assessment in nuclear power plants,” Reliab. Eng. Syst. Saf., vol. 201, p. 106878, 2020, doi: 10.1016/j.ress.2020.106878.