Mathursday: Dynkin’s π-λ Theorem and CDF (Part 1)

I am starting a new series of posts called “Mathursday” (portmanteau: Maths+Thursday) containing short articles on  important mathematical results. Today’s article is about Dynkin’s π-λ Theorem and using it to prove that a Cummulative Distribution Function (CDF) uniquely determines the probability distribution. This is one of the result that is quite intuitive and well known but requires a bit of analysis to formally prove or even to state. There are several such results in statistics that use similar techniques for proving. Generally, the treatment of these proofs is completely omitted in statistics book (like Wasserman). The alternative is to meander your way through mathematical course on probability theory to extract the result. In this series of posts, I hope to give a direct self-contained mathematical treatment.

The proof uses basic results and definition from two important areas of mathematics: topology and measure theory. This series is divided into two posts. The first post (this one) covers basic of the above fields. The aim here is not to reproduce a complete treatment as found in mathematical books but give a concise concentrated basic treatment of different topics. You can skip to the next post if you already know these basics.

Set Theory

A set X is a collection of some members. Each member in the set appears one time. It can be finite (e.g., natural numbers less than 20), countable (meaning it can be enumerated e.g., set of odd natural numbers) or uncountable (e.g., real numbers). A subset A of X (denoted as A \subseteq X) is a set containing only members of X. Power set of a set X (denoted as 2^X) is the set of all subsets of X. A set containing no members is called an empty set (denoted by \emptyset). The cardinality of a set S (denoted by |S|) is the number of members in the set.

For two sets A and B:  their union (denoted A \cup B) is a set containing all members of A and B; their intersection (denoted A \cap B) is a set containing members common to A and B; the set subtraction A-B is a set obtained by removing all members of B present in A while retaining other members. If A \subseteq X is a subset of X then the complement set of A is defined as A^c = X - A. Observe that for any two sets A, B we have A-B = A \cap B^c. There is a useful identity called De Morgan’s law which states, for any two sets A, B: (A\cap B)^c = A^c \cup B^c and (A\cup B)^c = A^c \cap B^c.

While the above may look all easy and simple, developing an axiom based version of set theory has been quite challenging. Naive set theory suffered from Russell’s paradox while modern attempts like ZFC had their own controversy. For our current purpose, we won’t go into these details.

General Topology

Definition (Topology): (X, \tau) is called a topological space if X is a set and \tau \subset 2^X is a collection of subsets of X called open sets that satisfy the following properties:
1) \emptyset, X \in \tau
2) \tau is closed under countable union i.e, if A_i \in \tau\,\, \forall i \in \mathbb{N} then \cup_{i \in \mathbb{N}} A_i \in \tau.
3) \tau is closed under finite intersection i.e., if A_i \in \tau\,\, \forall i \in \{1, \cdots, k\} for k < \infty then \cap_{i=1}^k A_i \in \tau.
If \tau satisfy these properties then it is called a topology on X.

There is a reason why we call members of the topology as open set and this will become clear when we consider topology induced by metric spaces.

Example 1: For any set X, the power set 2^X and \{\emptyset, X\} are topologies on X. Let X = \{1, 2, 3\} then \tau = \{ \emptyset, \{1\}, \{2, 3\}, \{1, 2, 3\}\} is a topology on X.

Definition (Metric Spaces): (X, d) is a metric space if X is a vector space and d: X \times X \rightarrow \mathbb{R} is a function satisfying the following properties for all x, y, z \in X:
1) d(x, y) = 0 \Longleftrightarrow x=y (identity of indiscernibles)
2) d(x, y) = d(y, x) (symmetry)
3) d(x, y) \le d(x, z) + d(z, y) (triangle inequality)
If d satisfies the above properties then it is called a metric on X.

Observe that setting y=x in the triangle inequality gives us d(x, x) \le d(x, z)+ d(z, x). Using symmetry we get d(x, x) \le 2d(x, z). As d(x, x)=0 we get d(x, z) \ge 0. Hence, d is a non-negative valued function.

Example 2: Metric is supposed to abstract commonly used distance functions. For example, Minkowski distance d(x, y) = \| x - y\|_p = \left( \sum_{i=1}^n |x_i-y_i|^p \right)^{\frac{1}{p}} for p \in (0, \infty] is a metric function. It is straightforward to verify that \| x - y\|_p the first two property. Proving triangle inequality is more challenging and uses Minkowski’s inequality.

One can define various sort of interesting geometric subsets using the metric. For example, we can define a ball of radius \epsilon as B_\epsilon(x) = \left\{ y \in X \mid d(x, y) \le \epsilon \right\}.

Metric spaces (X, d) provide a way to induce a useful topology on X. We define the topology induced by metric space as follows:

\tau = \left\{ S \subseteq 2^X \,\,|\,\, \forall x \in S, \exists\,\, \epsilon > 0\,\, \mbox{ such that } \,\, B_\epsilon(x) \subseteq S \right\}

It is straightforward to verify that \tau is a topology. It is interesting to note that if X = \mathbb{R} and d(x, y) = |x-y|, then open intervals (a, b) (open in the geometric sense) are members of \tau. This means our standard geometric notion of open set coincides with open set in topology. Unless mentioned otherwise, we will assume the topology defined on real number space \mathbb{R}^n to be the topology defined above using Euclidean distance as our choice of metric.

Lemma 1: Every open set on \mathbb{R} is a countable union of disjoint open intervals.
Proof: Let S be an open set on \mathbb{R}. We define an equivalence relation x \sim y if  [\min(x, y), \max(x, y)] \subseteq S. It is easy to see that each equivalance class is an open interval (a, b) for some a, b \in \mathbb{R} with a < b. Further, these open classes are disjoint else they can be merged together. Finally, for each equivalence class we can associate a rational number in the interval. The proof is completed by observing that the set of rational numbers is countable.

Measure Theory

Measure theory is a field of mathematics concerned with assigning value to subsets that provides a useful theoretical basis for probability theory, for defining Lebesgue integration and studying dynamical systems. Below we consider main concepts in measure theory that will be useful in our proof.

Definition (Algebra on Subsets): For a set X, a nonempty collection of subsets C \subseteq X is an algebra of subsets if:
1. A  \in C \rightarrow  A^c \in C (closed under complement).
2. A, B \in C \rightarrow A \cap B \in C (closed under finite intersection).

Lemma 2: If C is an algebra on X then \emptyset, X \in C and for all A, B \in C we have A \cup B \in C.
Proof: As C is nonempty let A \in C. Then A^c \in C and \emptyset = A \cap A^c = C. From DeMorgan’s law: A \cup B = (A^c \cap B^c)^c which belongs to C as the algebra is closed under complement and finite intersections. Lastly, X = A \cup A^c \in C.

Definition (σ-algebra): For a set X, a σ-algebra is an algebra on X that is closed under countable union i.e., if \{A_i\}_{i\in \mathbb{N}} is a countable collection of sets in the σ–algebra then \cup_{i\in \mathbb{N}} A_i \in σ-algebra.

Observe that the second condition of algebra of subsets only implies it is closed under finite union and not closed under countable unions.  For example, let consider a collection of subsets S of \mathbb{R} which are finite or have finite complement. Then S is an algebra and closed under finite union: union of any two members still yields a finite set (or a finite complement). But if we take countable unions of members of S then we can get the set of natural numbers which are neither finite nor have a finite complement. Therefore, the countable union condition in σ-algebra makes it a strict generalization of algebra on subsets.

Using De Morgan’s law we can show that any σ-algebra is also closed under countable intersection: let \{A_i\}_{i=1}^\infty be a countable collection of sets in the σ-algebra. Then \cap_{i=1}^\infty A_i = \left(\cup_{i=1}^\infty A_i^c\right)^c. As σ-algebra  is closed under intersections and countable union therefore, \cap_{i=1}^\infty A_i belongs to the σ-algebra. Another nice property of algebra of subsets and σ-algebra is that they are closed under countable intersection. We leave the proof as it is quite straightforward.

Lemma 2: A countable intersection of σ-algebras is also a σ-algebra.

The intersection property of σ-algebra allows us to define a useful notion: Given a subset Y \subset X, we define \sigma(Y) as the smallest σ-algebra that contains Y. This can be formally defined as: \sigma(Y) = \cap \left\{ \Sigma \mid \Sigma \right. is a σ-algebra and \left. Y \subseteq \Sigma \right\}. This definition is well formed as the power set is always a σ-algebra containing any subset.

σ-algebra and Topology: One cannot fail to observe a remarkable similarity between the definition of σ-algebra and topology. Given a topology and σ-algebra on a set X. Both contain the empty set and X, and are closed under countable union. However, a topology is not closed under complement and is only closed under finite intersection. This difference stems from their different purpose. For example, assigning measure to a set intuitively suggests being able to assign measure to its complement.

In general, measure theory and topology do not always sync. One has to only look at the different topics in the two fields. However, there is an important interplay between topology and measure theory that allows us to talk about measure on top of open sets. Borel set captures this idea:

Definition (Borel Set): Given a topological space (X, C), the Borel σ-algebra \mathbb{B} is the smallest σ-algebra generated by topology: \mathbb{B} = \sigma(C). A member of \mathbb{B} is called a Borel set.

Lemma 3: For standard topology on \mathbb{R}, the Borel σ-algebra \mathbb{B}(\mathbb{R}) on \mathbb{R} is given by:

1) \mathbb{B}(\mathbb{R}) = \sigma\left(\left\{ (a, b) \mid a, b \in \mathbb{R} \right\}\right)
2) \mathbb{B}(\mathbb{R}) = \sigma\left(\left\{ (-\infty, a] \mid a \in \mathbb{R} \right\}\right)
3) \mathbb{B}(\mathbb{R}) = \sigma\left(\left\{ (-\infty, a) \mid a \in \mathbb{R} \right\}\right)
4) \mathbb{B}(\mathbb{R}) = \sigma\left(\left\{ [b, \infty) \mid b \in \mathbb{R} \right\}\right)
5) \mathbb{B}(\mathbb{R}) = \sigma\left(\left\{ (b, \infty) \mid b \in \mathbb{R} \right\}\right)

Proof: We prove the first two case. The proof for remaining case is similar.

1. From Lemma 1, any open set in \mathbb{R} is generated by a countable union of disjoint open intervals. As σ-algebra is closed under countable union so \mathbb{B}(\mathbb{R})=\sigma\left(\left\{ (a, b) \mid a, b \in \mathbb{R} \right\}\right).

2. Let A = \sigma\left(\left\{ (-\infty, a] \mid a \in \mathbb{R} \right\}\right). As A is closed under complement therefore, we can also express: A = \sigma\left(\left\{ (b, \infty) \mid b \in \mathbb{R} \right\}\right). However, (b, \infty) is an open set and A is a σ-algebra generated by a collection of open sets. Therefore, A \subseteq \mathbb{B}(\mathbb{R}).

Let a < b then (-\infty, b], (a, \infty) \in A. As a σ-algebra is closed under intersection, therefore, (a, b] = (-\infty, b] \cap (a, \infty) \in A. It can now be shown that any open interval can be generated from countable union: (a, b) = \cup_{n=1}^\infty \left(a, b - \frac{1}{n}\right] \in A. As A contains open interval therefore, it contains all member of \mathbb{B}(\mathbb{R}) (using our proof of case 1). This implies \mathbb{B}(\mathbb{R}) \subseteq A. Hence, \mathbb{B}(\mathbb{R}) = A.

Definition (Measurable Space): A measurable space is a tuple (X, \Sigma) where X is a set and \Sigma is a σ-algebra on X.

Example 3: (\mathbb{R}, \mathbb{B}) is a measurable space where the real number line is the set and set of all Borel sets is our σ-algebra.

The next notion is useful in defining the notion of random variables in probability theory.

Definition (Measurable Function): A function f: (X_1, \Sigma_1) \rightarrow (X_2, \Sigma_2) is called measurable if (X_1, \Sigma_1), (X_2, \Sigma_2) are measurable spaces, and \forall E \in \Sigma_2, f^{-1}(E) \in \Sigma_1.

In the above definition: f^{-1}(E) = \left\{ x \in X_1 \mid f(x) \in E \right\}. The job of measurable functions is to ensure that we stick to members in σ-algebras when doing calculations.

Definition (Measure): Let X be a set and \Sigma be a σ-algebra on X. A measure \mu: \Sigma \rightarrow [0, \infty] is a function satisfying the following property:
1. \mu(\emptyset) = 0 (null empty set)
2. If \{E_i\}_{i=1}^\infty be a countable collection of pairwise disjoint members of \Sigma then \mu(\cup_{i=1}^\infty E_i) = \sum_{i=1}^\infty \mu(E_i) (σ-additivity).

The σ-additivity condition also holds for a finite set: let \{E_1, \cdots, E_k\} be a set of pairwise disjoint members of \Sigma. Then define E_j = \emptyset for all values of j > k. Then \{E_i\}_{i=1}^\infty is an infinite countable set of pairwise disjoint members (Yes, \{\emptyset, \emptyset\} are disjoint). Therefore, \mu(\cup_{i=1}^k E_i) = \mu(\cup_{i=1}^\infty E_i) = \sum_{i=1}^\infty \mu(E_i) = \sum_{i=1}^k \mu(E_i). The last equality used \mu(\emptyset) = 0.

A measure is called finite if \forall E \in \Sigma, \mu(E) < \infty. If for any member E \in \Sigma we have \mu(E) = 0 then we call E a null set. Intuitively, null set denotes the smallest subsets in the σ-algebra.

Example 4: If X is finite and the σ-algebra is the power set then \mu(E) = |E| is a measure. This measure is called the counting measure.

Measures are intuitively supposed to generalize the common notion of length, area, and volume. In general however,  constructing a useful measure such as Lebesgue measure is quite challenging. See Hunter’s lecture note on measure theory for full treatment.

Definition (Probability Measure):Let X be a set and \Sigma be a σ-algebra on X. A probability measure P is a measure with the unitarity property that states: P(X) = 1.

When talking about probability measures, we understand X as the set of elementary outcomes and \Sigma as a collection of events. For example, say we flip a coin two times then an elementary outcome is [H, H] indicating two heads, and an event would be outcomes with at least one heads.

The need for σ-algebra: The definition of σ-algebra looks opaque. Why cannot we simply use the power set to define the domain of our measure? The answer to this question has an interesting history which arises from the discovery of non-measurable sets at the beginning of 20th century. Mathematicians found that if we consider all possible subsets of \mathbb{R} for defining a measure, then no function exists that satisfies the above properties and generalizes the notion of length of an interval i.e., \mu([a, b]) = b-a! Interesting non-measurable sets like the Vitali set were discovered. The concept of σ-algebra was, therefore, used to narrow down the choice of subsets while maintaining useful properties. For example, if we talk about the probability of an event E then we may also want to consider the probability of it not taking place i.e., E^c hence our σ-algebra must contain E^c.

Definition (Measure Space and Probability Space): A measure space is a tuple (X, \Sigma, \mu) where (X, \Sigma) is a measurable space and \mu is a measure on \Sigma. A measure space is called a probability space if \mu is a probability measure i.e., \mu(X) = 1.

Definition (Random Variables): A real-valued random variable g: (X, \Sigma, P) \rightarrow (\mathbb{R}, \mathbb{B}) is a measurable function from a probability space  (X, \Sigma, P) to the measurable space (\mathbb{R}, \mathbb{B}).

When considering the measurability of the random variable we ignore the probability measure P and consider (X, \Sigma) as our input measurable space. Further, even though we consider Borel σ-algebra  as our choice of σ-algebra  on \mathbb{R} we can use any other σ-algebra. 

Cummulative Distribution Function: Given a random variable g: (X, \Sigma, P) \rightarrow (\mathbb{R}, \mathbb{B}), the cummulative distribution function (CDF) is a function G: \mathbb{R} \rightarrow [0, 1] defined as G(z) = P(g^{-1}((-\infty, z])).

This definition is a bit involved. Firstly, observe that the closed ray (-\infty, z] \in \mathbb{B} for any value of z \in \mathbb{R}. Therefore, as random variable is a measurable function we have g^{-1}((-\infty, z]) \in \Sigma. As P is a probability measure on \Sigma therefore P(g^{-1}((-\infty, z]))) \in [0, 1].

The reason it is called CDF is since we can think of it as the total accumulated probability measure assigned to the outcomes that give a value of less than equal to z, as we vary z from -\infty to +\infty. It is straightforward to verify that CDF is a non-decreasing function. Further, it can be easily shown that \lim_{z \rightarrow -\infty} G(z) = 0 and \lim_{z \rightarrow \infty} G(z) = 1.

We are now ready to state the main theorem that we prove in the next post.

Theorem: Let f: (X_1, \Sigma_1, P_1) \rightarrow (\mathbb{R}, \mathbb{B}) and g: (X_2, \Sigma_2, P_2) \rightarrow (\mathbb{R}, \mathbb{B}) be two random variables with CDF F, G respectively. If F(x) = G(x), \,\, \forall x \in \mathbb{R} then P_1(g^{-1}(A))=P_2(g^{-1}(A)), \,\, \forall A \in \mathbb{B}(\mathbb{R}).

We do not require the two random variables to even have the same input space or σ-algebra. The theorem states that if two real-valued random variables have the same CDF then they have the same induced distribution over the real number line.

Leave a comment