확률 게임

게임 이론에서 로이드 샤플리가 1950년대 초 선보인 확률형 게임은 한 명 이상의 플레이어가 펼치는 확률적 전환이 반복되는 게임이다.^[1] 그 게임은 일련의 단계로 진행된다. 각 스테이지가 시작될 때 경기는 어떤 상태에 있다. 플레이어는 행동을 선택하고 각 플레이어는 현재 상태와 선택한 행동에 따라 보상을 받는다. 그런 다음 게임은 이전 상태와 플레이어가 선택한 행동에 따라 분포가 달라지는 새로운 무작위 상태로 이동한다. 절차는 새로운 상태에서 반복되며 극은 한정된 또는 무한정 많은 스테이지에 대해 계속된다. 플레이어에 대한 총 보수는 스테이지 보상의 할인된 합계 또는 스테이지 보상의 평균보다 낮은 한도로 간주되는 경우가 많다.

확률형 게임은 마르코프 의사결정 프로세스를 여러 상호 작용하는 의사결정자에게 일반화하며, 전략형 게임은 플레이어들의 선택에 따라 환경이 변화하는 역동적인 상황에 맞춰 일반화한다.^[2]

2인용 게임

지시된 그래프의 확률형 2인용 게임은 알려지지 않은 (수리적) 환경에서 작동하는 이산 시스템의 모델링 및 분석에 널리 사용된다. 시스템과 그 환경의 가능한 구성은 정점으로 표시되며, 전환은 시스템, 그 환경 또는 "자연"의 작용에 해당한다. 그러면 시스템의 실행은 그래프의 무한 경로에 해당된다. 따라서 한 선수(시스템)는 '선행'의 확률 극대화를, 다른 선수(환경)는 정반대를 지향하는 적대적 목표를 가진 두 선수로 볼 수 있다.

많은 경우, 이 확률의 평형값이 존재하지만, 두 선수 모두를 위한 최적의 전략은 존재하지 않을 수 있다.

We introduce basic concepts and algorithmic questions studied in this area, and we mention some long-standing open problems. Then, we mention selected recent results.

Theory

The ingredients of a stochastic game are: a finite set of players $I$ ; a state space $M$ (either a finite set or a measurable space $(M,{\mathcal {A}})$ ); for each player $i\in I$ , an action set $S^{i}$ (either a finite set or a measurable space $(S^{i},{\mathcal {S}}^{i})$ ); a transition probability $P$ from $M\times S$ , where $S=\times _{i\in I}S^{i}$ is the action profiles, to $M$ , where $P(A\mid m,s)$ is the probability that the next state is in $A$ given the current state $m$ and the current action profile $s$ ; and a payoff function $g$ from $M\times S$ to $R^{I}$ , where the $i$ -th coordinate of $g$ , $g^{i}$ , is the payoff to player $i$ as a function of the state $m$ and the action profile $s$ .

게임은 어떤 초기 $m_{1}$ m $m_{1}$ 에서 시작된다 $t$ $stage$ t $m_{1}$ {\ $displaystyle$ $m_{t$ $},$ 플레이어는 $m_{t}$ m $m_{t}$ ${\$ $displaystyle m_{t}},$ $s_{t}^{i}\in S^{i}$ 동시에 동작 $s_{t}^{i}\in S^{i}$ 를 선택하고 $s_{t}^{i}\in S^{i}$ $동작$ $s_{t}=(s_{t}^{i})_{i}$ $s_{t}=(s_{t}^{i})_{i}$ = $s_{t}=(s_{t}^{i})_{i}$ ( $s_{t}=(s_{t}^{i})_{i}$ ) ${$ }{ $i}}}$ 을 관찰한다. $displaystyle s_{t}=(s_{t}^{i})_{i}}$ , and then nature selects $m_{t+1}$ according to the probability $P(\cdot \mid m_{t},s_{t})$ . A play of the stochastic game, ${\displaystyle m_{1},s_{1},\ldots ,m_{t},s_{t},\ldot$ $s }$ 은 $m_{1},s_{1},\ldots ,m_{t},s_{t},\ldots$ 는) g $g_{1},g_{2},\ldots$ , $g_{1},g_{2},\ldots$ $g_{1},g_{2},\ldots$ … ${\displaystyle g_{1},g_{2},\ldots },$ 여기서 $g_{t}=g(m_{t},s_{t})$ = $g_{t}=g(m_{t},s_{t})$ $g_{t}=g(m_{t},s_{t})$ s $g_{t}=g(m_{t},s_{t})$ ) $g_{t}=g(m_{t},s_{t})$ 의 지급 스트림을 정의한다 $g_{t}=g(m_{t},s_{t})$

The discounted game $\Gamma _{\lambda }$ with discount factor $\lambda$ ( $0<\lambda \leq 1$ ) is the game where the payoff to player $i$ is $\lambda \sum _{t=1}^{\infty }(1-\lambda )^{t-1}g_{t}^{i}$ . The $n$ -stage game is the game where the payoff to player $i$ is ${\bar {g}}_{n}^{i}:={\frac {1}{n}}\sum _{t=1}^{n}g_{t}^{i}$ .

The value $v_{n}(m_{1})$ , respectively $v_{\lambda }(m_{1})$ , of a two-person zero-sum stochastic game $\Gamma _{n}$ , respectively $\Gamma _{\lambda }$ , with finitely many states and actions exists, and Truman Bewley and Elon Kohlberg (1976) proved that $v_{n}(m_{1})$ converges to a limit as $n$ goes to infinity and that $v_{\lambda }(m_{1})$ converges to the same limit as $\lambda$ goes to $0$ .

The "undiscounted" game $\Gamma _{\infty }$ is the game where the payoff to player $i$ is the "limit" of the averages of the stage payoffs. Some precautions are needed in defining the value of a two-person zero-sum $\Gamma _{\infty }$ and in defining equilibrium payoffs of a non-zero-sum $\Gamma _{\infty }$ . The uniform value $v_{\infty }$ of a two-person zero-sum stochastic game $\Gamma _{\infty }$ exists if for every $\varepsilon >0$ there is a positive integer $N$ and a strategy pair $\sigma _{\varepsilon }$ of player 1 and $\tau _{\varepsilon }$ of player 2 such that for every $\sigma$ and $\tau$ and every $n\geq N$ the expectation of ${\bar {g}}_{n}^{i}$ with respect to the probability on plays defined by $\sigma _{\varepsilon }$ and $\tau$ is at least $v_{\infty }-\varepsilon$ , and the expectation of ${\bar {g}}_{n}^{i}$ with respect to the probability on plays defined by $\sigma$ and $\tau _{\varepsilon }$ is at most $v_{\infty }+\varepsilon$ . Jean-François Mertens and Abraham Neyman (1981) proved that every two-person zero-sum stochastic game with finitely many states and actions has a uniform value.^[3]

If there is a finite number of players and the action sets and the set of states are finite, then a stochastic game with a finite number of stages always has a Nash equilibrium. The same is true for a game with infinitely many stages if the total payoff is the discounted sum.

The non-zero-sum stochastic game $\Gamma _{\infty }$ has a uniform equilibrium payoff $v_{\infty }$ if for every $\varepsilon >0$ there is a positive integer $N$ and a strategy profile $\sigma$ such that for every unilateral deviation by a player $i$ , i.e., a strategy profile $\tau$ with $\sigma ^{j}=\tau ^{j}$ for all $j\neq i$ , and every $n\geq N$ the expectation of ${\bar {g}}_{n}^{i}$ with respect to the probability on plays defined by $\sigma$ is at least $v_{\infty }^{i}-\varepsilon$ , and the expectation of ${\bar {g}}_{n}^{i}$ with respect to the probability on plays defined by $\tau$ is at most $v_{\infty }^{i}+\varepsilon$ . Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a uniform equilibrium payoff.^[4]

The non-zero-sum stochastic game $\Gamma _{\infty }$ has a limiting-average equilibrium payoff $v_{\infty }$ if for every $\varepsilon >0$ there is a strategy profile $\sigma$ such that for every unilateral deviation by a player $i$ , the expectation of the limit inferior of the averages of the stage payoffs with respect to the probability on plays defined by $\sigma$ is at least $v_{\infty }^{i}-\varepsilon$ , and the expectation of the limit superior of the averages of the stage payoffs with respect to the probability on plays defined by $\tau$ is at most $v_{\infty }^{i}+\varepsilon$ . Jean-François Mertens and Abraham Neyman (1981) proves that every two-person zero-sum stochastic game with finitely many states and actions has a limiting-average value,^[3] and Nicolas Vieille has shown that all two-person stochastic games with finite state and action spaces have a limiting-average equilibrium payoff.^[4] In particular, these results imply that these games have a value and an approximate equilibrium payoff, called the liminf-average (respectively, the limsup-average) equilibrium payoff, when the total payoff is the limit inferior (or the limit superior) of the averages of the stage payoffs.

Whether every stochastic game with finitely many players, states, and actions, has a uniform equilibrium payoff, or a limiting-average equilibrium payoff, or even a liminf-average equilibrium payoff, is a challenging open question.

A Markov perfect equilibrium is a refinement of the concept of sub-game perfect Nash equilibrium to stochastic games.

Stochastic games have been combined with Bayesian games to model uncertainty over player strategies.^[5] The resulting "stochastic Bayesian game" model is solved via a recursive combination of the Bayesian Nash equilibrium equation and the Bellman optimality equation.

Applications

Stochastic games have applications in economics, evolutionary biology and computer networks.^[6]^[7] They are generalizations of repeated games which correspond to the special case where there is only one state.

Notes

^ Shapley, L. S. (1953). "Stochastic games". PNAS. 39 (10): 1095–1100. Bibcode:1953PNAS...39.1095S. doi:10.1073/pnas.39.10.1095. PMC 1063912. PMID 16589380.
^ Solan, Eilon; Vieille, Nicolas (2015). "Stochastic Games". PNAS. 112 (45): 13743–13746. doi:10.1073/pnas.1513508112. PMC 4653174. PMID 26556883.
^ ^a ^b Mertens, J. F. & Neyman, A. (1981). "Stochastic Games". International Journal of Game Theory. 10 (2): 53–66. doi:10.1007/BF01769259. S2CID 189830419.
^ ^a ^b Vieille, N. (2002). "Stochastic games: Recent results". Handbook of Game Theory. Amsterdam: Elsevier Science. pp. 1833–1850. ISBN 0-444-88098-4.
^ Albrecht, Stefano; Crandall, Jacob; Ramamoorthy, Subramanian (2016). "Belief and Truth in Hypothesised Behaviours". Artificial Intelligence. 235: 63–94. arXiv:1507.07688. doi:10.1016/j.artint.2016.02.004. S2CID 2599762.
^ Constrained Stochastic Games in Wireless Networks by E.Altman, K.Avratchenkov, N.Bonneau, M.Debbah, R.El-Azouzi, D.S.Menasche
^ Djehiche, Boualem; Tcheukam, Alain; Tembine, Hamidou (2017-09-27). "Mean-Field-Type Games in Engineering". AIMS Electronics and Electrical Engineering. 1: 18–73. arXiv:1605.03281. doi:10.3934/ElectrEng.2017.1.18. S2CID 16055840.

External links

Lecture on Stochastic Two-Player Games by Antonin Kucera

[1] Shapley, L. S. (1953). "Stochastic games". PNAS. 39 (10): 1095–1100. Bibcode:1953PNAS...39.1095S. doi:10.1073/pnas.39.10.1095. PMC 1063912. PMID 16589380.

[2] Solan, Eilon; Vieille, Nicolas (2015). "Stochastic Games". PNAS. 112 (45): 13743–13746. doi:10.1073/pnas.1513508112. PMC 4653174. PMID 26556883.

[MertensNeyman-3] Mertens, J. F. & Neyman, A. (1981). "Stochastic Games". International Journal of Game Theory. 10 (2): 53–66. doi:10.1007/BF01769259. S2CID 189830419.

[Vieille-4] Vieille, N. (2002). "Stochastic games: Recent results". Handbook of Game Theory. Amsterdam: Elsevier Science. pp. 1833–1850. ISBN 0-444-88098-4.

[5] Albrecht, Stefano; Crandall, Jacob; Ramamoorthy, Subramanian (2016). "Belief and Truth in Hypothesised Behaviours". Artificial Intelligence. 235: 63–94. arXiv:1507.07688. doi:10.1016/j.artint.2016.02.004. S2CID 2599762.

[6] Constrained Stochastic Games in Wireless Networks by E.Altman, K.Avratchenkov, N.Bonneau, M.Debbah, R.El-Azouzi, D.S.Menasche

[7] Djehiche, Boualem; Tcheukam, Alain; Tembine, Hamidou (2017-09-27). "Mean-Field-Type Games in Engineering". AIMS Electronics and Electrical Engineering. 1: 18–73. arXiv:1605.03281. doi:10.3934/ElectrEng.2017.1.18. S2CID 16055840.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

v t Topics in game theory
Definitions	Congestion game Cooperative game Determinacy Escalation of commitment Extensive-form game First-player and second-player win Game complexity Game description language Graphical game Hierarchy of beliefs Information set Normal-form game Preference Sequential game Simultaneous game Simultaneous action selection Solved game Succinct game
Equilibrium concepts	Nash equilibrium Subgame perfection Mertens-stable equilibrium Bayesian Nash equilibrium Perfect Bayesian equilibrium Trembling hand Proper equilibrium Epsilon-equilibrium Correlated equilibrium Sequential equilibrium Quasi-perfect equilibrium Evolutionarily stable strategy Risk dominance 코어 샤플리 값 파레토 효율 깁스 평형 양자 반응 평형 자기 확인 평형 강한 나시 평형 마르코프 완전 평형
전략들	우세한 전략 순수전략 혼합 전략 전략-스틸링 인수 Tit for tat 그림 트리거 공모 후진 유도 전진 유도 마르코프 전략 입찰 셰이딩
반 사냥감의	협상문제 싸구려 말씨 글로벌 게임 자동 게임 평균 필드 게임 메커니즘 설계 n-플레이어 게임 완벽한 정보 대형 포아송 게임 포텐셜 게임 반복 게임 스크리닝 게임 신호 게임 엄격하게 결정된 게임 확률 게임 대칭 게임 제로섬 게임
게임.	가다 체스 무한 체스 체커스 틱택토 죄수의 딜레마 선물 교환 게임 선택형수의 딜레마 여행자의 딜레마 코디네이션 게임 치킨 지네 게임 루이스 시그널 게임 자원봉사자의 딜레마 달러 경매 성 전투 사슴 사냥 매칭 페니 얼티메이텀 게임 가위바위보 해적 게임 독재자 게임 공공재 게임 블로토 게임 소모전 엘 파롤 바 문제 공정분할 페어 케이크 커팅 쿠르노 게임 교착 상태 다이너의 딜레마 평균의 2/3을 추측하라. 쿤 포커 나시 흥정 게임 유도 퍼즐 트러스트 게임 공주와 괴물 게임 랑데부 문제
정리	화살의 불가능 정리 오만의 합의 정리 민속 정리 미니맥스 정리 내시의 정리 정화 정리 계시의 원리 제르멜로의 정리
키 수치	앨버트 W. 터커 아모스 트베르스키 앙투안 아우구스틴 쿠르노 아리엘 루빈스타인 클로드 섀넌 대니얼 카너 데이비드 K. 레빈 데이비드 M. 크렙스 도널드 B. 길리스 드루 푸덴베르크 에릭 마신 해럴드 쿤 허버트 사이먼 헤르베 물랭 존 콘웨이 장 티롤 장프랑수아 메르텐스 제니퍼 투어 체이스 존 하사니 존 메이너드 스미스 존 나시 존 폰 노이만 케네스 애로우 케네스 빈모어 레오니드 후르비츠 로이드 샤플리 멜빈 드레스허 메릴 M. 홍수 올가 본다레바 오스카르 모겐스턴 폴 밀그롬 페이턴 영 라인하르트 셀턴 로버트 액슬로드 로버트 아우만 로버트 B. 윌슨 로저 마이어슨 새뮤얼 보울스 수잔 스카치머 토머스 셸링 윌리엄 비크리
잡다한	올페이 경매 알파-베타 가지치기 베르트랑 역설 한정적 합리성 콤비네이터 게임 이론 대립분석 쿠페티션 진화 게임 이론 체스의 첫 동작의 이점 게임 설명 언어 게임 역학 게임 이론 용어집 게임 이론가 목록 게임 이론의 게임 목록 승리가 없는 상황 체스 풀기 위상 게임 공동체의 비극 작은 결정의 횡포

Search

확률 게임

네임스페이스

더

목차

2인용 게임

Theory

Applications

See also

Notes

Further reading

External links