In this post I will build on the previous posts related to probability theory – I have defined the main results of probability from axioms from set theory. Now it is time to consider the concept of random variables which are fundamental to all higher level statistics. I recall finding this a slippery concept initially but since it is so foundational there is no avoiding this unless you want to be severely crippled in understanding higher level work. This section does have a calculus prerequisite – it is important to know what integration is and what it does geometrically. If you want to review this then an excellent online resource is Pauls Online Notes.

A scientific experiment contains many characteristics which can be measured. In most cases, an experimenter will focus on some characteristics in particular. Each outcome of an experiment can be associated with a number by specifying a rule which governs that association. The concept of a random variable allows the connecting of experimental outcomes to a numerical function of outcomes.

A random variable (r.v.) $X$ is a function defined on a sample space, $S$, that associates a real number, $X(\omega) = x$, with each outcome $\omega$ in $S$. This concept is quite abstract and can be made more concrete by reflecting on an example.

Example 1: Consider tossing 2 balanced coins and we note down the values of the faces that come out as a result. Then the sample space $S = \{HH, HT, TH, TT \}$. Define the random variable $X(\omega) = n$, where $n$ is the number of heads and $\omega$ can represent a simple event such as $HH$. Then the possible values of the random variable are:

$X(\omega) = 0$ if $\omega = \{T,T\}$

$X(\omega) = 1$ if $\omega = \{\{H,T\},\{T,H\}\}$

$X(\omega) = 2$ if $\omega = \{H,H\}$

For example 1, $X$ is a function which associates a real number with the outcomes of the experiment of tossing 2 coins. Here the r.v. is defined to count the number of heads. Two coins are flipped and an outcome $\omega$ is obtained. The outcome $\omega$ is an element of the sample space $S$. The random variable $X$ is applied on the outcome $\omega$, $X(\omega)$, which maps the outcome to a real number based on characteristics observed in the outcome. Let the observed outcome be $\omega = \{H,T\}$. The function $X(\omega)$ counts how many $H$ were observed in $\omega$ – which in this case is $X(\omega) = 1$

For each set of values of a random variable, there are a corresponding collection of underlying outcomes. Through these events, we connect the values of random variables with probability values. In the coin tossing example we have 4 outcomes and their associated probabilities are:

$Pr(X(\omega) = 0) = \frac{1}{4}$ (There is one element in the sample set where $X(\omega) = 0$)

$Pr(X(\omega) = 1) = \frac{2}{4}$ (There are two elements in the sample set where $X(\omega) = 1$)

$Pr(X(\omega) = 2) = \frac{1}{4}$ (There is one element in the sample set where $X(\omega) = 2$)

A random variable is represented by a capital letter and a particular realised value of a random variable is denoted by a corresponding lowercase letter. So $X$ can be a random variable and $x$ is a realised value of the random variable. In this short post we cover two types of random variables – Discrete and Continuous.

# Discrete Random Variables

A random variable $X$ is called discrete if it can assume only a finite or a countably infinite number of distinct values.

Example 2: In tossing 3 fair coins, define the random variable $X = \text{number of tails}$. Then $X$ can assume values 0,1,2,3. Connecting these values with probabilities yields

$Pr(X = 0) = Pr[\{H, H, H\}] = \frac{1}{8}$
$Pr(X = 1) = Pr[\{H, H, T\} \cup \{H, T, H\} \cup \{T, H, H\}] = \frac{3}{8}$
$Pr(X = 2) = Pr[\{T, T, H\} \cup \{H, T, T\} \cup \{T, H, T\}] = \frac{3}{8}$
$Pr(X = 3) = Pr[\{T, T, T\}] = \frac{1}{8}$

So far so good – let’s develop these ideas more systematically to obtain some basic definitions. In general, if we let the discrete random variable $X$ assume vales $x_1, x_2,….$ then we can define a probability on the sample space.

A probability mass function or probability function of a discrete random variable $X$ is the function
$f_{X}(x) = Pr(X = x_i),\ i = 1,2,….$ the probability function allows us to answer the questions about probabilities associated with real values of a random variable. Applying this to example 2 we can say the probability that $X$ takes the value $x = 2$ is $f_{X}(2) = Pr(X = 2) = \frac{3}{8}$. The probability function $f_{X}(x)$ is nonnegative (obviously because how can we have negative probabilities!).

A cumulative distribution function (cdf) $F_{X}(x)$ of the random variable $X$ is defined by

$F_{X}(x) = Pr(X \leq x) = \sum_{\forall y \leq x} f_{Y}(y)$ ,   $-\infty < x < \infty$

The cdf of a random variable is a function which “collects” probabilities as $x$ increases. This will be defined in more detail later but applying it to example 2, we can ask questions like “what is the probability that $X$ is less than or equal to 2?”

$$F_{X}(2) = Pr(X \leq 2) = \sum_{y = 0}^{2} f_{X}(y) = f_{X}(0) + f_{X}(1) + f_{X}(2) = \frac{1}{8} + \frac{3}{8} + \frac{3}{8} = \frac{7}{8}$$

Since $X$ must take on one of the values in $\{x_1, x_2,…\}$, it follows that as we collect all the probabilities
$$\sum_{i=1}^{\infty} f_{X}(x_i) = 1$$
Let’s look at another example to make these ideas firm.

Example 3: Suppose that a fair coin is tossed twice such that the sample space is $S = \{HH, HT, TH, TT \}$. Let $X$ be the number of heads.

1. Find the probability function of $X$
2. Find the cumulative distribution function of $X$

Solution: When ranges for $X$ are not satisfied, we have to define the function over the whole domain of $X$

1. $Pr(X = 0) = \frac{1}{4}$, $Pr(X = 1) = \frac{2}{4}$, $Pr(X=2) = \frac{1}{4}$

So putting the function in a table for convenience

$$F_{X}(0) = \sum_{y = 0}^{0} f_{X}(y) = f_{X}(0) = \frac{1}{4}$$
$$F_{X}(1) = \sum_{y = 0}^{1} f_{X}(y) = f_{X}(0) + f_{X}(1) = \frac{1}{4} + \frac{2}{4} = \frac{3}{4}$$
$$F_{X}(2) = \sum_{y = 0}^{2} f_{X}(y) = f_{X}(0) + f_{X}(1) + f_{X}(2) = \frac{1}{4} + \frac{2}{4} + \frac{1}{4} = 1$$

# Continuous Random Variables

To introduce the concept of a continuous random variable – let $X$ be a random variable. Suppose that there exist a nonnegative real-valued function:$$f: R \rightarrow [0, \infty)$$ such that for any interval $[a,b]$

$$Pr[X \in [a,b]] = \int_{a}^{b} f(t) dt$$

If the above holds, then $X$ is called a continuous random variable. The function $f$ is called the probability density function (pdf) of $X$.

Continuous random variables are used to model quantities which don’t take discrete values or cannot easily take discrete values and it makes more sense to model the quantities as intervals. For a given function $f$ to be a pdf, it must satisfy two conditions:

1. $f_{X}(x) \geq 0$ for all values of $x$
2. $\int_{-\infty}^{\infty} f_{X}(x) dx = 1$ (the infinities are a placeholder for the full range of $X$)

The cumulative distribution function (cdf) for a continuous random variable is given by
$$F_{X}(x) = Pr(X \leq x) = \int_{-\infty}^{x} f(t)dt$$

There is a relationship between the pdf and cdf of a continuous random variable which comes from the fundamental theorem of calculus. Let $X$ be a random variable
$$\frac{dF_{X}(x)}{dx} = f_{X}(x)$$

Moreover, if $f$ is the pdf of a random variable $X$, then
$$Pr(a \leq X \leq b) = \int_{a}^{b} f_{X}(x)dx$$

Unlike for discrete random variables, for any real number $a$, $Pr(X = a) = 0$. This is by construction since a continuous random variable is only defined over an interval. Furthermore
$$Pr(a \leq X \leq b) = Pr(a < X \leq b) = Pr(a \leq X < b) = Pr(a < X < b)$$

For computation purposes we also notice
$$Pr(a \leq X \leq b) = F_{X}(b) – F_{X}(a) = Pr(X \leq a) – Pr(X \leq b)$$

These are lots of equations and there is seemingly no use for any of this – so let’s look at examples to see if we can salvage all the reading done so far.

Example 4: Consider the function
$f_{X}(x) = \lambda x e^{-x}$ for $x>0$ and 0 otherwise

1. For what value of $\lambda$ is $f$ a pdf
2. Find $F_{X}(x)$

Solution:

From the definition of a pdf $\int_{-\infty}^{\infty} f_{X}(x) dx = 1$

$$\int_{0}^{\infty} \lambda x e^{-x} dx = 1$$
$$= \lambda \int_{0}^{\infty} x e^{-x} dx = \lambda[0 – e^{-x}|_{0}^{\infty}] = \lambda = 1$$

The cdf is

$F_{X}(x) = \int_{\infty}^{x} f(t)dt = \int_{0}^{x} te^{-t} dt = 1 – (x + 1)e^{-x}$ for $x \geq 0$ and $0$ otherwise.

# The Cumulative Distribution Function

There are further properties of the cumulative distribution function which are important to be mentioned. Once again, the cdf is defined as
$$F_{X}(x) = Pr(X \leq x)$$

Discrete case: $F_{X}(x) = \sum_{t \leq x} f(t)$
Continuous case: $F_{X}(x) = \int_{-\infty}^{x} f(t)dt$

Further properties are:

1. $0 \leq F_{X}(x) \leq 1$
2. $F_{X}(x)$ is a non-decreasing function because it accumulates probability as $x$ increases
3. $F_{Y}(y) = 0$ for every point $y$ which is less than the smallest value in the space of $X$
4. $F_{Z}(z) = 1$ for every point $z$ which is greater than the largest value in the space of $X$
5. $F_{X}(x)$ is a step function with its height is equal to $f_{X}(x) = Pr(X = x)$

# References

1. Mathematical Statistics with Applications by Kandethody M. Ramachandran and Chris P. Tsokos
2. Probability and Statistics by Morris Degroot (My all time favourite probability text)