Stochastic Game Theory
Stochastic game theory extends classical game theory by incorporating random elements into the analysis of strategic interactions among players. This extension is particularly useful for modeling scenarios where outcomes are influenced by probabilistic events.
Mathematical Formulation
A stochastic game is defined by the tuple \( (S, A, P, r) \), where:
- \( S \) is a finite set of states.
- \( A = A_1 \times A_2 \times … \times A_n \) is the set of joint actions, where \( A_i \) is the action set for player \( i \).
- \( P(s’ | s, a) \) is the state transition probability, representing the probability of moving to state \( s’ \) given the current state \( s \) and joint action \( a \).
- \( r_i(s, a) \) is the immediate reward function for player \( i \), given the current state \( s \) and joint action \( a \).
The goal of each player is to maximize their expected total reward over time, considering both immediate and future rewards.
Bellman Equation
The value function \( V_i(s) \) for player \( i \) represents the expected total reward starting from state \( s \). It satisfies the Bellman equation:
$$ V_i(s) = \max_{a \in A} \left[ r_i(s, a) + \gamma \sum_{s’ \in S} P(s’ | s, a) V_i(s’) \right] $$
where:
- \( \gamma \) is the discount factor, \( 0 \leq \gamma < 1 \).
Nash Equilibrium
A Nash equilibrium in a stochastic game is a strategy profile \( (\pi_1, \pi_2, …, \pi_n) \) such that no player can unilaterally improve their expected total reward by deviating from their strategy. Formally, for all players \( i \) and all states \( s \):
$$ V_i^{\pi_i, \pi_{-i}}(s) \geq V_i^{\pi_i’, \pi_{-i}}(s) $$
where \( \pi_{-i} \) denotes the strategies of all players except \( i \), and \( \pi_i’ \) is any alternative strategy for player \( i \).
Derivation of the Bellman Equation
To derive the Bellman equation, consider the value function \( V_i(s) \). The expected total reward starting from state \( s \) is the immediate reward plus the discounted expected reward of the next state:
$$ V_i(s) = \mathbb{E} \left[ \sum_{t=0}^{\infty} \gamma^t r_i(s_t, a_t) \mid s_0 = s \right] $$
By the principle of optimality, the value function can be decomposed as:
$$ V_i(s) = \max_{a \in A} \left[ r_i(s, a) + \gamma \sum_{s’ \in S} P(s’ | s, a) V_i(s’) \right] $$
This equation captures the recursive nature of the value function, where the value at each state is determined by the immediate reward and the expected future rewards.
Proof of Nash Equilibrium
To prove that a strategy profile \( (\pi_1, \pi_2, …, \pi_n) \) is a Nash equilibrium, we need to show that no player \( i \) can improve their expected total reward by unilaterally deviating. For each player \( i \) and state \( s \):
$$ V_i^{\pi_i, \pi_{-i}}(s) \geq V_i^{\pi_i’, \pi_{-i}}(s) $$
Assume there exists a deviation \( \pi_i’ \) such that \( V_i^{\pi_i’, \pi_{-i}}(s) > V_i^{\pi_i, \pi_{-i}}(s) \). This would contradict the definition of \( \pi_i \) as an optimal strategy, hence proving that \( (\pi_1, \pi_2, …, \pi_n) \) is a Nash equilibrium.
Example: Stochastic Matrix Game
Consider a two-player stochastic matrix game with the following components:
- States \( S = {s_1, s_2} \)
- Actions \( A_1 = A_2 = {a_1, a_2} \)
- State transition probabilities and reward functions are given by the following tables:
State Transition Probabilities:
( s ) | ( a_1, a_1 ) | ( a_1, a_2 ) | ( a_2, a_1 ) | ( a_2, a_2 ) |
---|---|---|---|---|
( s_1 ) | 0.7, 0.3 | 0.4, 0.6 | 0.5, 0.5 | 0.8, 0.2 |
( s_2 ) | 0.4, 0.6 | 0.6, 0.4 | 0.3, 0.7 | 0.7, 0.3 |
Reward Functions for Player 1:
( s ) | ( a_1, a_1 ) | ( a_1, a_2 ) | ( a_2, a_1 ) | ( a_2, a_2 ) |
---|---|---|---|---|
( s_1 ) | 5 | 1 | 2 | 3 |
( s_2 ) | 0 | 3 | 4 | 2 |
Reward Functions for Player 2:
( s ) | ( a_1, a_1 ) | ( a_1, a_2 ) | ( a_2, a_1 ) | ( a_2, a_2 ) |
---|---|---|---|---|
( s_1 ) | 3 | 2 | 1 | 0 |
( s_2 ) | 4 | 1 | 5 | 2 |
The Bellman equations for each player can be written and solved to find the optimal strategies and the Nash equilibrium.
Conclusion
Stochastic game theory provides a powerful framework for analyzing strategic interactions in dynamic and uncertain environments. By incorporating elements of probability, it extends the classical concepts of game theory to more realistic scenarios, offering valuable insights into the behavior of rational agents in complex systems.