Kolmogorov’s Probability Theory: Detailed Explanation, Proofs, and Derivations
Andrey Kolmogorov formulated the axioms of probability theory in the 1930s, which provide a rigorous foundation for the study of probability. Kolmogorov’s axioms are the basis for modern probability theory.
Kolmogorov’s Axioms
Let \( (\Omega, \mathcal{F}, \mathbb{P}) \) be a probability space, where \( \Omega \) is the sample space, \( \mathcal{F} \) is a σ-algebra of events, and \( \mathbb{P} \) is a probability measure. Kolmogorov’s axioms are as follows:
1. **Non-negativity**: For every event \( A \in \mathcal{F} \),
$$ \mathbb{P}(A) \geq 0 $$
2. **Normalization**: The probability of the sample space \( \Omega \) is 1,
$$ \mathbb{P}(\Omega) = 1 $$
3. **Additivity**: For any countable sequence of mutually exclusive events \( A_1, A_2, \ldots \) in \( \mathcal{F} \),
$$ \mathbb{P}\left(igcup_{i=1}^{\infty} A_i
ight) = \sum_{i=1}^{\infty} \mathbb{P}(A_i) $$
Consequences of Kolmogorov’s Axioms
Several important properties and theorems in probability theory can be derived from Kolmogorov’s axioms.
Complement Rule
For any event \( A \in \mathcal{F} \), the probability of the complement of \( A \) is given by:
$$ \mathbb{P}(A^c) = 1 – \mathbb{P}(A) $$
**Proof**: Since \( A \cup A^c = \Omega \) and \( A \cap A^c = \emptyset \),
$$ \mathbb{P}(\Omega) = \mathbb{P}(A \cup A^c) = \mathbb{P}(A) + \mathbb{P}(A^c) $$
By the normalization axiom,
$$ 1 = \mathbb{P}(A) + \mathbb{P}(A^c) $$
Rearranging gives:
$$ \mathbb{P}(A^c) = 1 – \mathbb{P}(A) $$
Probability of the Empty Set
The probability of the empty set \( \emptyset \) is 0,
$$ \mathbb{P}(\emptyset) = 0 $$
**Proof**: Since \( \emptyset \cup \Omega = \Omega \) and \( \emptyset \cap \Omega = \emptyset \),
$$ \mathbb{P}(\Omega) = \mathbb{P}(\emptyset \cup \Omega) = \mathbb{P}(\emptyset) + \mathbb{P}(\Omega) $$
By the normalization axiom,
$$ 1 = \mathbb{P}(\emptyset) + 1 $$
Rearranging gives:
$$ \mathbb{P}(\emptyset) = 0 $$
Finite Additivity
For any finite sequence of mutually exclusive events \( A_1, A_2, \ldots, A_n \) in \( \mathcal{F} \),
$$ \mathbb{P}\left(igcup_{i=1}^{n} A_i
ight) = \sum_{i=1}^{n} \mathbb{P}(A_i) $$
**Proof**: This is a direct consequence of the additivity axiom by considering only a finite number of terms.
Inclusion-Exclusion Principle
For any two events \( A \) and \( B \) in \( \mathcal{F} \),
$$ \mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) – \mathbb{P}(A \cap B) $$
**Proof**: The events \( A \cup B \) can be decomposed as \( A \), \( B \), and \( A \cap B \),
$$ \mathbb{P}(A \cup B) = \mathbb{P}(A) + \mathbb{P}(B) – \mathbb{P}(A \cap B) $$
This follows from the fact that \( A \cup B \) includes both \( A \) and \( B \) but the intersection \( A \cap B \) is counted twice.
Boole’s Inequality
For any countable sequence of events \( A_1, A_2, \ldots \) in \( \mathcal{F} \),
$$ \mathbb{P}\left(igcup_{i=1}^{\infty} A_i
ight) \leq \sum_{i=1}^{\infty} \mathbb{P}(A_i) $$
**Proof**: This is a consequence of the additivity axiom and the sub-additivity property of measures.
\subsubsection*{Conditional Probability}
The conditional probability of an event \( A \) given an event \( B \) with \( \mathbb{P}(B) > 0 \) is defined as:
$$ \mathbb{P}(A \mid B) = rac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)} $$
Bayes’ Theorem
Bayes’ theorem relates the conditional probabilities of two events. For events \( A \) and \( B \) with \( \mathbb{P}(A) > 0 \) and \( \mathbb{P}(B) > 0 \),
$$ \mathbb{P}(A \mid B) = rac{\mathbb{P}(B \mid A) \mathbb{P}(A)}{\mathbb{P}(B)} $$
**Proof**: By the definition of conditional probability,
$$ \mathbb{P}(A \mid B) = rac{\mathbb{P}(A \cap B)}{\mathbb{P}(B)} $$
and
$$ \mathbb{P}(B \mid A) = rac{\mathbb{P}(A \cap B)}{\mathbb{P}(A)} $$
Rearranging the second equation gives:
$$ \mathbb{P}(A \cap B) = \mathbb{P}(B \mid A) \mathbb{P}(A) $$
Substituting this into the first equation,
$$ \mathbb{P}(A \mid B) = rac{\mathbb{P}(B \mid A) \mathbb{P}(A)}{\mathbb{P}(B)} $$