RV_Prob_Distributions.DVI

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

1. DISCRETE RANDOM VARIABLES

1.1. Deﬁnition of a Discrete Random Variable. A random variable X is said to be discrete if it can

assume only a ﬁnite or countable inﬁnite number of distinct values. A discrete random variable

can be deﬁned on both a countable or uncountable sample space.

1.2. Probability for a discrete random variable. The probability that X takes on the value x, P(X=x),

is deﬁned as the sum of the probabilities of all sample points in Ω that are assigned the value x. We

may denote P(X=x) by p(x) or p

(x). The expression p

(x) is a function that assigns probabilities

to each possible value x; thus it is often called the probability function for the random variable X.

1.3. Probability distribution for a discrete random variable. The probability distribution for a

discrete random variable X can be represented by a formula, a table, or a graph, which provides

(x) = P(X=x) for all x. The probability distribution for a discrete random variable assigns nonzero

probabilities to only a countable number of distinct x values. Any value x not explicitly assigned a

positive probability is understood to be such that P(X=x) = 0.

The function p

(x)= P(X=x) for each x within the range of X is called the probability distribution

of X. It is often called the probability mass function for the discrete random variable X.

1.4. Properties of the probability distribution for a discrete random variable. A function can

serve as the probability distribution for a discrete random variable X if and only if it s values,

(x), satisfy the conditions:

a: p

(x) ≥ 0 for each value within its domain

(x)=1, where the summation extends over all the values within its domain

1.5. Examples of probability mass functions.

1.5.1. Example 1. Find a formula for the probability distribution of the total number of heads ob-

tained in four tosses of a balanced coin.

The sample space, probabilities and the value of the random variable are given in table 1.

From the table we can determine the probabilities as

P (X =0) =

,P(X =1) =

,P(X =2) =

,P(X =3) =

,P(X =4) =

(1)

Notice that the denominators of the ﬁve fractions are the same and the numerators of the ﬁve

fractions are 1, 4, 6, 4, 1. The numbers in the numerators is a set of binomial coefﬁcients.





















We can then write the probability mass function as

Date: November 1, 2005.

2 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

TABLE 1. Probability of a Function of the Number of Heads from Tossing a Coin

Four Times.

Table R.1

Tossing a Coin Four Times

Element of sample space Probability Value of random variable X (x)

HHHH 1/16 4

HHHT 1/16 3

HHTH 1/16 3

HTHH 1/16 3

THHH 1/16 3

HHTT 1/16 2

HTHT 1/16 2

HTTH 1/16 2

THHT 1/16 2

THTH 1/16 2

TTHH 1/16 2

HTTT 1/16 1

THTT 1/16 1

TTHT 1/16 1

TTTH 1/16 1

TTTT 1/16 0

(x)=





for x =0, 1 , 2 , 3 , 4 (2)

Note that all the probabilities are positive and that they sum to one.

1.5.2. Example 2. Roll a red die and a green die. Let the random variable be the larger of the two

numbers if they are different and the common value if they are the same. There are 36 points in

the sample space. In table 2 the outcomes are listed along with the value of the random variable

associated with each outcome.

The probability that X = 1, P(X=1) = P[(1, 1)] = 1/36. The probability that X = 2, P(X=2) = P[(1, 2),

(2,1), (2, 2)] = 3/36. Continuing we obtain

P (X =1) =

,P(X =2) =

,P(X =3) =

P (X =4) =

,P(X =5) =

,P(X =6) =

We can then write the probability mass function as

(x)=P (X = x)=

2 x − 1

for x =1, 2 , 3 , 4 , 5 , 6

Note that all the probabilities are positive and that they sum to one.

1.6. Cumulative Distribution Functions.

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 3

TABLE 2. Possible Outcomes of Rolling a Red Die and a Green Die – First Number

in Pair is Number on Red Die

Green (A) 1 2 3 4 5 6

Red (D)

1.6.1. Deﬁnition of a Cumulative Distribution Function. If X is a discrete random variable, the function

given by

(x)=P (x ≤ X)=

t ≤x

p(t) for −∞≤x ≤∞ (3)

where p(t) is the value of the probability distribution of X at t, is called the cumulative distribution

function of X. The function F

(x) is also called the distribution function of X.

1.6.2. Properties of a Cumulative Distribution Function. The values F

(X) of the distribution function

of a discrete random variable X satisfy the conditions

1: F(-∞) = 0 and F(∞) =1;

2: If a < b, then F(a) ≤ F(b) for any real numbers a and b

1.6.3. First example of a cumulative distribution function. Consider tossing a coin four times. The

possible outcomes are contained in table 1 and the values of p(·) in equation 2. From this we can

determine the cumulative distribution function as follows.

F (0) = (0) =

F (1) = (0) + p(1) =

F (2) = (0) + p(1) + p(2) =

F (3) = (0) + p(1) + p(2) + p(3) =

F (4) = p(0) + p(1) + p(2) + p(3) + p(4) =

We can write this in an alternative fashion as

4 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

(x)=











0 for x < 0

for 0 ≤ x<1

for 1 ≤ x<2

for 2 ≤ x<3

for 3 ≤ x<4

1 for x ≥ 4

1.6.4. Second example of a cumulative distribution function. Consider a group of N individuals, M of

whom are female. Then N-M are male. Now pick n individuals from this population without

replacement. Let x be the number of females chosen. There are





ways of choosing x females

from the M in the population and



N − M

n −x



ways of choosing n-x of the N - M males. Therefore,

there are







N − M

n −x



ways of choosing x females and n-x males. Because there are





ways of

choosing n of the N elements in the set, and because we will assume that they all are equally likely

the probability of x females in a sample of size n is given by

(x)=P (X = x)=





N − M

n −x





 for x =0, 1 , 2 , 3 , ···,n

and x ≤ M, and n − x ≤ N − M.

(4)

For this discrete distribution we compute the cumulative density by adding up the appropriate

terms of the probability mass function.

F (0) = p(0)

F (1) = p(0) + p(1)

F (2) = p(0) + p(1) + p(2)

F (3) = p(0) + p(1) + p(2) + px(3)

F (n)=p(0) + p(1) + p(2) + p(3) + ··· + p(n)

(5)

Consider a population with four individuals, three of whom are female, denoted respectively

by A, B, C, D where A is a male and the others are females. Then consider drawing two from this

population. Based on equation 4 there should be





= 6 elements in the sample space. The sample

space is given by

TABLE 3. Drawing Two Individuals from a Population of Four where Order

Does Not Matter (no replacement)

Element of sample space Probability Value of random variable X

AB 1/6 1

AC 1/6 1

AD 1/6 1

BC 1/6 2

BD 1/6 2

CD 1/6 2

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 5

We can see that the probability of 2 females is

. We can also obtain this using the formula as

follows.

p(2) = P (X =2)=











(3)(1)

(6)

Similarly

p(1) = P (X =1)=











(3)(1)

(7)

We cannot use the formula to compute f(0) because (2 - 0) 6≤ (4 - 3). f(0) is then equal to 0. We can

then compute the cumulative distribution function as

F (0) = p(0) = 0

F (1) = p(0) + p(1) =

F (2) = p(0) + p(1) + p(2) = 1

(8)

1.7. Expected value.

1.7.1. Deﬁnition of expected value. Let X be a discrete random variable with probability function

(x). Then the expected value of X, E(X), is deﬁned to be

E(X)=

(x) (9)

if it exists. The expected value exists if

|x | p

(x) < ∞ (10)

The expected value is kind of a weighted average. It is also sometimes referred to as the popu-

lation mean of the random variable and denoted µ

1.7.2. First example computing an expected value. Toss a die that has six sides. Observe the number

that comes up. The probability mass or frequency function is given by

(x)=P (X = x)=

(

for x =1, 2, 3, 4, 5, 6

0 otherwise

(11)

We compute the expected value as

E(X)=

xX

(x)

i =1





1+2+3+4+5+6

(12)

6 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

1.7.3. Second example computing an expected value. Consider a group of 12 television sets, two of

which have white cords and ten which have black cords. Suppose three of them are chosen at ran-

dom and shipped to a care center. What are the probabilities that zero, one, or two of the sets with

white cords are shipped? What is the expected number with white cords that will be shipped?

It is clear that x of the two sets with white cords and 3-x of the ten sets with black cords can be

chosen in







3−x



ways. The three sets can be chosen in





ways. So we have a probability

mass function as follows.

(x)=P (X = x)=





3−x







for x =0, 1 , 2 (13)

For example

p(0) = P (X =0)=





−0







(1) (120)

220

(14)

We collect this information as in table 4.

TABLE 4. Probabilities for Television Problem

x 0 1 2

(x) 6/11 9/22 1/22

(x) 6/11 21/22 1

We compute the expected value as

E(X)=

xX

(x)

= (0)





+ (1)





+ (2)





(15)

1.8. Expected value of a function of a random variable.

Theorem 1. Let X be a discrete random variable with probability mass function p

(x) and g(X) be a real-

valued function of X. Then the expected value of g(X) is given by

E[g(X)] =

g(x) p

(x) . (16)

Proof for case of ﬁnite values of X. Consider the case where the random variable X takes on a ﬁnite

number of values x

, ···x

. The function g(x) may not be one-to-one (the different values

of x

may yield the same value of g(x

). Suppose that g(X) takes on m different values (m ≤ n). It

follows that g(X) is also a random variable with possible values g

,...g

and probability

distribution

P [g (X)=g

∀x

such that

g(x

)=g

p(x

)=p

∗

) (17)

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 7

for all i = 1, 2, . . . m. Here p

∗

) is the probability that the experiment results in a value for the

function f of the initial random variable of g

. Using the deﬁnition of expected value in equation

we obtain

E[g(X)] =

∗

). (18)

Now substitute in to obtain

E[g(X)] =

i =1

∗

) .

i =1







∀x

g ( x

)=g

p ( x

)







i =1







∀x

g ( x

)=g

p ( x

)







j =1

g (x

) p( x

(19)



1.9. Properties of mathematical expectation.

1.9.1. Constants.

Theorem 2. Let X be a discrete random variable with probability function p

(x) and c be a constant. Then

E(c) = c.

Proof. Consider the function g(X) = c. Then by theorem 1

E[c] ≡

(x)=c

(x) (20)

But by property 1.4b, we have

(x)=1

and hence

E (c)=c · (1) = c. (21)



8 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

1.9.2. Constants multiplied by functions of random variables.

Theorem 3. Let X be a discrete random variable with probability function p

(x), g(X) be a function of X,

and let c be a constant. Then

E [ cg( X )] ≡ cE[(g ( X )] (22)

Proof. By theorem 1 we have

E[cg(X)] ≡

cg(x) p

(x)

= c

g(x) p

(x)

= cE[g(X)]

(23)



1.9.3. Sums of functions of random variables.

Theorem 4. Let X be a discrete random variable with probability function p

(x), g

(X) ,g

(X) , ···,g

(X)

be k functions of X. Then

E [g

(X)+g

(X)+···+ g

(X)] ≡ E[g

(X)] + E[g

(X)] + ···+ E[g

(X)] (24)

Proof for the case of k = 2. By theorem 1 we have we have

E [g

(X)+g

(X)] ≡

(x)+g

(x)] p

(x)

≡

(x) p

(x)+

(x) p

(x)

= E [g

(X)] + E [ g

(X)] ,

(25)



1.10. Variance of a random variable.

1.10.1. Deﬁnition of variance. The variance of a random variable X is deﬁned to be the expected

value of (X − µ)

. That is

V (X)=E



( X − µ )



(26)

The standard deviation of X is the positive square root of V(X).

1.10.2. Example 1. Consider a random variable with the following probability distribution.

TABLE 5. Probability Distribution for X

x p

(x)

0 1/8

1 1/4

2 3/8

3 1/4

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 9

We can compute the expected value as

µ = E(X)=

x =0

(x)

= (0)





+ (1)





+ (2)





+ (3)





(27)

We compute the variance as

= E[X − µ)

]=Σ

x =0

(x − µ)

(x)

=(0 − 1.75)





+(1− 1.75)





+(2− 1.75)





+(3− 1.75)





= .9375

and the standard deviation as

=0.9375

σ =+

√

.9375 = 0.97.

1.10.3. Alternative formula for the variance.

Theorem 5. Let X be a discrete random variable with probability function p

(x); then

V (X) ≡ σ

= E



(X − µ )



= E





− µ

(28)

Proof. First write out the ﬁrst part of equation 28 as follows

V (X) ≡ σ

= E



(X − µ )



= E



− 2 µX + µ



= E





− E (2 µX)+E



 (29)

where the last step follows from theorem 4. Note that µ is a constant, then apply theorems 3 and

2 to the second and third terms in equation 28 to obtain

V (X) ≡ σ

= E



( X − µ )



= E





− 2 µE(X)+ µ

(30)

Then making the substitution that E(X) = µ, we obtain

V (X) ≡ σ

= E





− µ

(31)



1.10.4. Example 2. Die toss.

Toss a die that has six sides. Observe the number that comes up. The probability mass or fre-

quency function is given by

(x)=P (X = x)=

(

for x =1, 2, 3, 4, 5, 6

0 otherwise

. (32)

We compute the expected value as

10 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

E(X)=

xX

(x)





1+2+3+4+5+6

(33)

We compute the variance by then computing the E(X

) as follows

E(X

xX

(x)

i =1





1+4+9+16+2+36

=15

(34)

We can then compute the variance using the formula Var(X) = E(X

)-E

(X) and the fact the E(X)

= 21/6 from equation 33.

Var( X)=E (X

) − E

(X)

−





−



441



546

−

441

105

=2.9166

(35)

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 11

2. THE ”DISTR IBUTION” OF RANDOM VARIABLES IN GENERAL

2.1. Cumulative distribution function. The cumulative distribution function (cdf) of a random

variable X, denoted by F

(·), is deﬁned to be the function with domain the real line and range the

interval [0,1], which satisﬁes F

(x)=P

[X ≤ x] = P [ {ω : X(ω) ≤ x }] for every real number x.

F has the following properties:

(−∞) = lim

x→−∞

(x)=0,F

(+∞) = lim

x→+∞

(x)=1, (36a)

(a) ≤ F

(b) for a < b, nondecreasing function of x, (36b)

lim

0<h→0

(x + h)=F

(x), continuous from the right, (36c)

2.2. Example of a cumulative distribution function. Consider the following function

(x)=

1+e

−x

(37)

Check condition 36a as follows.

lim

x→−∞

(x) = lim

x→−∞

1+e

−x

= lim

x→∞

1+e

lim

x→∞

(x) = lim

x→∞

1+e

−x

(38)

To check condition 36b differentiate the cdf as follows

( x )



1+e

−x



−x

(1 + e

−x

)

> 0

(39)

Condition 36c is satisﬁed because F

(x) is a continuous function.

2.3. Discrete and continuous random variables.

2.3.1. Discrete random variable. A random variable X will be said to be discrete if the range of X is

countable, that is if it can assume only a ﬁnite or countably inﬁnite number of values. Alternatively,

a random variable is discrete if F

(x) is a step function of x.

2.3.2. Continuous random variable. A random variable X is continuous if F

(x) is a continuous func-

tion of x.

2.4. Frequency (probability mass) function of a discrete random variable.

2.4.1. Deﬁnition of a frequency (discrete density) function. If X is a discrete random variable with the

distinct values, x

, ···,x

, ···, then the function denoted by p(·) and deﬁned by

(x)=

(

P [X = x

] x = x

,j=1, 2 , ... , n, ...

0 x 6= x

(40)

is deﬁned to be the frequency, discrete density, or probability mass function of X. We will often

write f

(x) for p

(x) to denote frequency as compared to probability.

A discrete probability distribution on R

is a probability measure P such that

12 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

∞

i =1

P ({x

})=1 (41)

for some sequence of points in R

, i.e. the sequence of points that occur as an outcome of the

experiment. Given the deﬁnition of the frequency function in equation 40, we can also say that any

non-negative function p on R

that vanishes except on a sequence x

, ···,x

, ··· of vectors and

that satisﬁes

∞

i =1

p(x

)=1

deﬁnes a unique probability distribution by the relation

P ( A )=

A

p ( x

) (42)

2.4.2. Properties of discrete density functions. As deﬁned in section 1.4, a probability mass function

must satisfy

) > 0, for j =1, 2, ... (43a)

(x)=0, for x 6= x

; j =1, 2, ..., (43b)

(x)

=1 (43c)

2.4.3. Example 1 of a discrete density function. Consider a probability model where there are two

possible outcomes to a single action (say heads and tails) and consider repeating this action several

times until one of the outcomes occurs. Let the random variable be the number of actions required

to obtain a particular outcome (say heads). Let p be the probability that outcome is a head and (1-p)

the probability of a tail. Then to obtain the ﬁrst head on the x

toss, we need to have a tail on the

previous x-1 tosses. So the probability of the ﬁrst had occurring on the x

toss is given by

(x)=P (X = x)=

(

(1 − p)

x − 1

pforx=1, 2 , ...

0 otherwise

(44)

For example the probability that it takes 4 tosses to get a head is 1/16 while the probability it

takes 2 tosses is 1/4.

2.4.4. Example 2 of a discrete density function. Consider tossing a die. The sample space is {1, 2, 3, 4,

5, 6}. The elements are {1}, {2}, ... . The frequency function is given by

(

x)=P (X = x)=

(

for x =1, 2, 3, 4, 5, 6

0 otherwise

. (45)

The density function is represented in ﬁgure 1.

2.5. Probability density function of a continuous random variable.

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 13

FIGURE 1. Frequency Function for Tossing a Die

1 2 3 4

8 9



fHxL

2.5.1. Alternative deﬁnition of continuous random variable. In section 2.3.2, we deﬁned a random vari-

able to be continuous if F

(x) is a continuous function of x. We also say that a random variable X

is continuous if there exists a function f(·) such that

(x)=

−∞

f(u) du (46)

for every real number x. The integral in equation 46 is a Riemann integral evaluated from -∞ to

a real number x.

2.5.2. Deﬁnition of a probability density frequency function (pdf). The probability density function,

(x), of a continuous random variable X is the function f(·) that satisﬁes

(x)=

−∞

(u) du (47)

2.5.3. Properties of continuous density functions.

(x) ≥ 0 ∀x (48a)

∞

−∞

(x) dx =1, (48b)

Analogous to equation 42, we can write in the continuous case

P (XA)=

(x) dx (49)

14 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

where the integral is interpreted in the sense of Lebesgue.

Theorem 6. For a density function f

(x) deﬁned over the set of all real numbers the following holds

P (a ≤ X ≤ b)=

(x) dx (50)

for any real constants a and b with a ≤ b.

Also note that for a continuous random variable X the following are equivalent

P (a ≤ X ≤ b)=P (a ≤ X<b)=P ( a<X≤ b)=P (a<X<b) (51)

Note that we can obtain the various probabilities by integrating the area under the density func-

tion as seen in ﬁgure 2.

FIGURE 2. Area under the Density Function as Probability

fHxL

2.5.4. Example 1 of a continuous density function. Consider the following function

(x)=

(

k · e

− 3 x

for x > 0

0 elsewhere

. (52)

First we must ﬁnd the value of k that makes this a valid density function. Given the condition

in equation 48b we must have that

∞

−∞

(x) dx =

∞

k · e

− 3 x

dx =1 (53)

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 15

Integrate the second term to obtain

∞

k · e

− 3 x

dx = k · lim

t →∞

− 3 x

−3

(54)

Given that this must be equal to one we obtain

⇒ k =3

(55)

The density is then given by

(x)=

(

3 · e

−3 x

for x > 0

0 elsewhere

. (56)

Now ﬁnd the probability that (1 ≤ X ≤ 2).

P (1 ≤ X ≤ 2) =

3 · e

−

3 x

= − e

− 3 x

= − e

− 6

+ e

− 3

= − 0.00247875 + 0.049787

=0.047308

(57)

2.5.5. Example 2 of a continuous density function. Let X have p.d.f.

(x)=

(

x · e

− x

for 0 ≤ x ≤∞

0 elsewhere

. (58)

This density function is shown in ﬁgure 3.

We can ﬁnd the probability that (1 ≤ X ≤ 2) by integration

P (1 ≤ X ≤ 2) =

x · e

−x

dx (59)

First integrate the expression on the right by parts letting u = x and dv = e

−x

dx. Then du = dx

and v = - e

−x

dx. We then have

P (1 ≤ X ≤ 2) = − xe

− x

−

−e

− x

= − 2 e

− 2

+ e

− 1

−



− x



= − 2 e

− 2

+ e

− 1

− e

− 2

+ e

− 1

= − 3 e

− 2

+2e

− 1

−3

= − 0.406 + 0.73575

=0.32975

(60)

16 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

FIGURE 3. Graph of Density Function xe

−x

2 4 6 8

0.05

0.1

0.15

0.2

0.25

0.3

0.35

fHxL

This is represented by the area between the lines in ﬁgure 4.

We can also ﬁnd the distribution function in this case.

(x)=

t · e

−t

dt (61)

Make the u dv substitution as before to obtain

(x)=− te

−t

−

−e

− t

= − te

−t

− e

− t

= e

− t

(−1 − t)|

= e

− x

(−1 − x) − e

− 0

( −1 − 0)

= e

− x

(−1 − x)+1

=1 − e

− x

(1 + x)

(62)

The distribution function is shown in ﬁgure 5.

Now consider the probability that (1 ≤ X ≤ 2)

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 17

FIGURE 4. P (1 ≤ X ≤ 2)

1 2 3 4 5 6 7 8 9

0.05

0.1

0.15

0.2

0.25

0.3

0.35

fHxL

P (1 ≤ X ≤ 2) = F (2) − F (1)

=1 − e

− 2

(1 + 2) − 1+e

− 1

(1 + 1)

=2e

− 1

− 3 e

− 2

=0.73575 − 0.406

=0.32975

(63)

We can see this as the difference in the values of F

(x) at 1 and at 2 in ﬁgure 6

2.5.6. Example 3 of a continuous density function. Consider the normal density function given by

f( x : µ, σ )=

√

2 πσ

· e

−1

(

x − µ

)

(64)

where µ and σ are parameters of the function. The shape and location of the density function

depends on the parameters µ and σ. In ﬁgure 7 the diagram the density is drawn for µ = 0, and σ =

1 and σ =2.

2.5.7. Example 4 of a continuous density function. Consider a random variable with density function

given by

(x)=

(

(p +1)x

0 ≤ x ≤ 1

0 otherwise

(65)

18 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

FIGURE 5. Graph of Distribution Function of Density Function xe

−x

1 2 3 4 5 6 7

0.2

0.4

0.6

0.8

fHxL

where p is greater than -1. For example, if p = 0, then f

(x) = 1, if p = 1, then f

(x) = 2x and so

on. The density function with p = 2 is shown in ﬁgure 8. The distribution function with p = 2 is

shown in ﬁgure 9.

2.6. Expected value.

2.6.1. Expectation of a single random variable. Let X be a random variable with density f

(x). The

expected value of the random variable, denoted E(X), is deﬁned to be

E(X)=







∞

−∞

(x) dx if X is continuous

xX

(x) if X is discrete

. (66)

provided the sum or integral is deﬁned. The expected value is kind of a weighted average. It is

also sometimes referred to as the population mean of the random variable and denoted µ

2.6.2. Expectation of a function of a single random variable. Let X be a random variable with density

(X). The expected value of a function g(·) of the random variable, denoted E(g(X)), is deﬁned to

E(g(X)) =

∞

−∞

g(x) f (x)dx (67)

if the integral is deﬁned.

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 19

FIGURE 6. P (1 ≤ X ≤ 2) using the Distribution Function

1 2 3 4 5 6 7

0.2

0.4

0.6

0.8

fHxL

The expectation of a random variable can also be deﬁned using the Riemann-Stieltjes integral

where F is a monotonically increasing function of X. Speciﬁcally

E(X)=

∞

−∞

xdF(x)=

∞

−∞

xdF (68)

2.7. Properties of expectation.

2.7.1. Constants.

E[a] ≡

∞

−∞

(x)dx

≡ a

∞

−∞

(x)dx

≡ a

(69)

2.7.2. Constants multiplied by a random variable.

E[aX] ≡

∞

−∞

axf

(x)dx

≡ a

∞

−∞

(x)dx

≡ aE[X]

(70)

20 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

FIGURE 7. Normal Density Function

- 4 - 2 2 4

0.1

0.2

0.3

0.4

fHxL

2.7.3. Constants multiplied by a function of a random variable.

E[ag(X)] ≡

∞

−∞

ag(x) f

(x)dx

≡ a

∞

−∞

g(x) f

(x)dx

≡ aE[g(X)]

(71)

2.7.4. Sums of expected values. Let X be a continuous random variable with density function f

(x)

and let g

(X) ,g

(X) , ···,g

(X) be k functions of X. Also let c

, ···c

be k constants.

Then

E [c

(X)+c

(X)+···+ c

(X)] ≡ E [c

(X)] + E [c

(X)] + ··· + E [c

(X)] (72)

2.8. Example 1. Consider the density function

(x)=

(

(p +1)x

0 ≤ x ≤ 1

0 otherwise

(73)

where p is greater than -1. We can compute the E(X) as follows.

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 21

FIGURE 8. Density Function ( p +1)x

0.2 0.4 0.6 0.8 1

0.5

1.5

2.5

fHxL

E(X)=

∞

−∞

(x)dx

x(p +1)x

(p+1)

(p +1)dx

(p+2)

(p +1)

(p +2)



p +1

p +2

(74)

2.9. Example 2. Consider the exponential distribution which has density function

(x)=

−x

0 ≤ x ≤∞,λ > 0 (75)

We can compute the E(X) as follows.

22 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

FIGURE 9. Density Function (p =1)x

0.2 0.4 0.6 0.8 1

0.2

0.4

0.6

0.8

FHxL

E(X)=

∞

−x

= −xe

−x

∞

−x



u =

,du =

dx, v = −λe

−x

,dv = e

−x



=0 +

∞

−x

= − λe

−x

∞

= λ

(76)

2.10. Variance.

2.10.1. Deﬁnition of variance. The variance of a single random variable X with mean µ is given by

Var( X) ≡ σ

≡ E

(X − E(X))

≡ E

( X − µ)

≡

∞

−∞

(x − µ)

(x)dx

(77)

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 23

We can write this in a different fashion by expanding the last term in equation 77.

Var(X) ≡

∞

−∞

(x − µ)

(x)dx

≡

∞

−∞

− 2 µx + µ

) f

(x)dx

≡

∞

−∞

(x) dx − 2 µ

∞

−∞

(x) dx + µ

∞

−∞

(x) dx

= E





− 2 µE [X]+µ

= E





− 2 µ

+ µ

= E





− µ

≡

∞

−∞

(x)dx −



∞

−∞

(x)dx



(78)

The variance is a measure of the dispersion of the random variable about the mean.

2.10.2. Variance example 1. Consider the density function

(x)=

(

(p +1)x

0 ≤ x ≤ 1

0 otherwise

(79)

where p is greater than -1. We can compute the Var(X) as follows.

E(X)=

∞

−∞

(x)dx

x(p +1)x

(p+2)

(p +1)

(p +2)

p +1

p +2

E(X

(p +1)x

(p +3)

(p +1)

(p +3)

p +1

p +3

Var( X )=E (X

) − E

( X )

p +1

p +3

−



p +1

p +2



p +1

(p +2)

(p +3)

(80)

The values of the mean and variances for various values of p are given in table 6.

24 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

TABLE 6. Mean and Variance for Distribution f

(x)=(p +1)x

for alternative

values of p

p -.5 0 1 2 ∞

E(x) 0.333 0.5 0.66667 0.75 1

Var(x) 0.08888 0.833333 0.277778 0.00047 0

2.10.3. Variance example 2. Consider the exponential distribution which has density function

(x)=

−x

0 ≤ x ≤∞,λ >0 (81)

We can compute the E(X

) as follows

E(X

∞

−x

= −x

−x

∞

−x



u =

,du=

2 x

dx, v = −λe

−x

,dv = e

−x



=0+2

∞

−x

= −2 λxe

−x

∞

λe

−x



u =2x, du=2dx, v = −λe

−x

,dv = e

−x



=0 + 2λ

∞

−x

=(2λ)



−λe

−x

∞



=(2λ)(λ )

=2λ

(82)

We can then compute the variance as

Var( X)=E (X

) − E

(X)

=2λ

− λ

= λ

(83)

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 25

3. MOMENTS AND MOMENT GENERATING FUNCTIONS

3.1. Moments.

3.1.1. Moments about the origin (raw moments). The r

moment about the origin of a random vari-

able X, denoted by µ

, is the expected value of X

; symbolically,

= E(X

)

(x)

(84)

for r = 0, 1, 2, . . . when X is discrete and

= E( X

)

∞

−∞

(x) dx

(85)

when X is continuous. The r

moment about the origin is only deﬁned if E[X

] exists. A

moment about the origin is sometimes called a raw moment. Note that µ

= E(X)=µ

, the

mean of the distribution of X, or simply the mean of X. The r

moment is sometimes written as a

function of θ where θ is a vector of parameters that characterize the distribution of X.

3.1.2. Central moments. The r

moment about the mean of a random variable X, denoted by µ

,is

the expected value of (X − µ

)

symbolically,

= E[(X − µ

)

]

(x − µ

)

(x)

(86)

for r = 0, 1, 2, . . . when X is discrete and

= E[(X − µ

)

]

∞

−∞

(x − µ

)

(x) dx

(87)

when X is continuous. The r

moment about the mean is only deﬁned if E[(X − µ

)

] exists.

The r

moment about the mean of a random variable X is sometimes called the r

central moment

of X. The r

central moment of X about a is deﬁned as E[(X −a)

].Ifa=µ

, we have the r

central

moment of X about µ

. Note that µ

= E[(X −µ

)] = 0 and µ

= E[(X −µ

)

] = Var[X]. Also note

that all odd moments of X around its mean are zero for symmetrical distributions, provided such

moments exist.

3.1.3. Alternative formula for the variance.

Theorem 7.

= µ

− µ

(88)

26 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

Proof.

Var( X) ≡ σ

≡ E

(X − E(X))

≡ E

(X − µ

)

≡ E



− 2 µ

X + µ



= E





− 2 µ

E [X ]+µ

= E





− 2 µ

+ µ

= E





− µ

= µ

− µ

(89)



3.2. Moment generating functions.

3.2.1. Deﬁnition of a moment generating function. The moment generating function of a random vari-

able X is given by

(t)=Ee

(90)

provided that the expectation exists for t in some neighborhood of 0. That is, there is an h>0

such that, for all t in −h<t<h, Ee

exists. We can write M

(t) as

( t )=

(

∞

−∞

(x) dx if X is continuous

P (X = x) if X is discrete

. (91)

To understand why we call this a moment generating function consider ﬁrst the discrete case.

We can write e

in an alternative way using a Maclaurin series expansion. The Maclaurin series of

a function f(t) is given by

f(t)=

∞

n =0

(n)

(0)

∞

n=0

(n)

(0)

n !

= f(0) +

(1)

(0)

t +

(2)

(0)

(3)

(0)

+ ··· +

= f(0) + f

(1)

(0)

+ f

(2)

(0)

+ f

(3)

(0)

+ ··· +

(92)

where f

(n)

is the n

derivative of the function with respect to t and f

(n)

(0) is the n

derivative

of f with respect to t evaluated at t = 0. For the function e

, the requisite derivatives are

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 27

= xe



t =0

= x



t =0

= x



t =0

= x



t =0

= x

(93)

We can then write the Maclaurin series as

∞

n =0

(0)

n !

∞

n =0

n !

=1 + tx +

+ ··· +

r !

+ ···

(94)

We can then compute E(e

)=M

(t) as





= M

(t)=

(x) (95)



1+tx+

+ ···+

+ ···



(x)

(x)+t

(x)+

(x)+···+

(x)+···

=1 + µt + µ

+ µ

+ ···+ µ

+ ···

In the expansion, the coefﬁcient of

is µ

, the r

moment about the origin of the random

variable X.

3.2.2. Example derivation of a moment generating function. Find the moment-generating function of

the random variable whose probability density is given by

(x)=

(

−x

for x > 0

0 elsewhere

(96)

and use it to ﬁnd an expression for µ

. By deﬁnition

28 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

(t)=E





∞

−∞

· e

−x

∞

−x (1 − t)

−1

t − 1

−x (1 −t)

∞

=0 −



−1

1 − t



1 − t

for t < 1

(97)

As is well known, when |t| < 1 the Maclaurin’s series for

1 − t

is given by

(t)=

1 − t

=1 + t + t

+ t

+ ··· + t

+ ···

=1 + 1! ·

+2!·

+3!·

!+··· + r! ·

+ ···

(98)

or we can derive it directly using equation 92. To derive it directly utilizing the Maclaurin series

we need the all derivatives of the function

1 − t

evaluated at 0. The derivatives are as follows

f(t)=

1 − t

=(1 − t)

− 1

(1)

=(1 − t)

− 2

(2)

=2(1 − t)

−3

(3)

=6(1 − t)

−4

(4)

= 24 (1 − t)

−5

(5)

= 120 (1 − t)

−6

(n)

= n !(1 − t)

(n +1)

(99)

Evaluating them at zero gives

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 29

f(0) =

1 − 0

=(1 − 0)

− 1

(1)

=(1 − 0)

−2

=1=1!

(2)

=2(1 − 0)

−

=2=2!

(3)

=6(1 − 0)

−4

=6=3!

(4)

= 24 (1 − 0)

−5

= 24 = 4!

(5)

= 120 (1 − 0)

−6

= 120 = 5!

(n)

= n!(1 − 0)

− (n +1)

= n!

(100)

Now substituting in appropriate values for the derivatives of the function f(t)=

1 − t

we obtain

f(t)=

∞

n =0

(n)

(0)

= f (0) +

(1)

(0)

t +

(2)

(0)

(3)

(0)

+ ··· +

=1 +

t +

+ ··· +

=1 + t + t

+ t

+ ··· +

(101)

A further issue is to determine the radius of convergence for this particular function. Consider

an arbitrary series where the n

term is denoted by a

. The ratio test says that

If lim

n →∞



n +1



= L<1 , then the series is absolutely convergent (102a)

lim

n →∞



n +1



= L>1 or lim

n →∞



n +1



= ∞, then the series is divergent (102b)

Now consider the n

term and the (n+1)

term of the Maclaurin series expansion of

1 − t

= t

lim

n →∞



n +1



= lim

n →∞

|t | = L

(103)

The only way for this to be less than one in absolute value is for the absolute value of t to be less

than one, i.e., |t| < 1. Now writing out the Maclaurin series as in equation 98 and remembering that

in the expansion, the coefﬁcient of

is µ

, the r

moment about the origin of the random variable

30 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

(t)=

1 − t

=1 + t + t

+ t

+ ··· + t

+ ···

=1 + 1! ·

+2!·

+3!·

!+··· + r! ·

+ ···

(104)

it is clear that µ

= r! for r = 0, 1, 2, ... For this density function E[X] = 1 because the coefﬁcient of

is 1. We can verify this by ﬁnding E[X] directly by integrating.

E (X)=

∞

x · e

−x

dx (105)

To do so we need to integrate by parts with u = x and dv = e

−x

dx. Then du = dx and v = −e

−x

dx.

We then have

E (X)=

∞

x · e

−x

dx, u = x, du = dx , v = −e

− x

,dv= e

−x

= − xe

−x

∞

−

∞

−e

− x

=[0 − 0] −



− x

∞



=0 − [0 − 1] = 1

(106)

3.2.3. Moment property of the moment generating functions for discrete random variables.

Theorem 8. If M

(t) exists, then for any positive integer k,

( t ))



t =0

= M

(k)

(0) = µ

. (107)

In other words, if you ﬁnd the k

derivative of M

(t) with respect to t and then set t = 0, the

result will be µ

Proof.

(t)

,orM

(k)

(t), is the k

derivative of M

(t) with respect to t. From equation 95 we

know that

(t)=E





=1+tµ

+ ··· (108)

It then follows that

(1)

(t)=µ

2 t

3 t

+ ··· (109a)

(2)

(t)=µ

2 t

3 t

+ ··· (109b)

where we note that

(n − 1)!

. In general we ﬁnd that

( k )

( t )=µ

2 t

k +1

3 t

k +2

+ ··· . (110)

Setting t = 0 in each of the above derivatives, we obtain

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 31

(1)

(0) = µ

(111a)

(2)

(0) = µ

(111b)

and, in general,

(k)

(0) = µ

(112)



These operations involve interchanging derivatives and inﬁnite sums, which can be justiﬁed if

(t) exists.

3.2.4. Moment property of the moment generating functions for continuous random variables.

Theorem 9. If X has mgf M

(t), then

= M

(n)

(0) , (113)

where we deﬁne

(n)

(0) =

(t)



t =0

(114)

The nth moment of the distribution is equal to the nth derivative of M

(t) evaluated at t = 0.

Proof. We will assume that we can differentiate under the integral sign and differentiate equation

91.

(t)=

∞

−∞

(x ) dx

∞

−∞





(x ) dx

∞

−∞





(x ) dx

= E





(115)

Now evaluate equation 115 at t = 0.

(t) |

t =0

= E







t=0

= EX (116)

We can proceed in a similar fashion for other derivatives. We illustrate for n = 2.

32 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

(t)=

∞

−∞

(x ) dx

∞

−∞





(x) dx

∞

−∞





(x) dx

∞

−∞





(x) dx

= E





(117)

Now evaluate equation 117 at t = 0.

(t) |

t =0

= E







t=0

= EX

(118)



3.3. Some properties of moment generating functions. If a and b are constants, then

X+a

(t)=E



(X + a)t



= e

· M

(t) (119a)

(t)=E



bXt



= M

( bt ) (119b)

MX + a

(t)=E



(

X+a

)



= e

· M





(119c)

3.4. Examples of moment generating functions.

3.4.1. Example 1. Consider a random variable with two possible values, 0 and 1, and corresponding

probabilities f(1) = p, f(0) = 1-p where we write f(·) for p(·). For this distribution

(t)=E





= e

t · 1

f (1) + e

t · 0

f (0)

= e

p + e

(1 − p)

= e

(1 − p)+e

=1 − p + e

=1 + p



− 1



(120)

The derivatives are

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 33

(1)

(t)=pe

(2)

(t)=pe

(3)

(t)=pe

(k)

(t)=pe

(121)

Thus





= M

(k)

(0) = pe

= p (122)

We can also ﬁnd this by expanding M

(t) using the Maclaurin series for the moment generating

function for this problem

(t)=E





=1 + p



− 1



(123)

To obtain this we ﬁrst need the series expansion of e

. All derivatives of e

are equal to e

. The

expansion is then given by

∞

n =0

(0)

∞

n =0

=1 + t +

+ ··· +

+ ···

(124)

Substituting equation 124 into equation 123 we obtain

(t)=1+pe

− p

=1 + p



1+t +

+ ··· +

r !

+ ···



− p

=1 + p + pt + p

+ p

+ ··· + p

+ ··· − p

=1 + pt + p

+ p

+ ··· + p

+ ···

(125)

We can then see that all moments are equal to p. This is also clear by direct computation

34 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

E (X) = (1) p + (0) (1 − p)=p





=(1

) p +(0

)(1 − p)=p





=(1

) p +(0

)(1 − p)=p





=(1

) p +(0

)(1 − p)=p

(126)

3.4.2. Example 2. Consider the exponential distribution which has a density function given by

(x)=

−x

, 0 ≤ x ≤∞,λ>0 (127)

For λt <1, we have

(t)=

∞

−x

∞

−

(

− t

)

∞

−

(

1 − λt

)



−λ

1 − λt



−

(

1 − λt

)

∞



−1

1 − λt



−

(

1 − λt

)

∞

=0 −



−1

1 − λt



1 − λt

(128)

We can then ﬁnd the moments by differentiation. The ﬁrst moment is

E(X)=

(1 − λt)

−1



t =0

= λ (1 − λt)

−2



t =0

= λ

(129)

The second moment is

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 35

E(X

(1 − λt)

−1



t =0



λ (1 − λt)

−2





t =0

=2λ

(1 − λt)

−3



t =0

=2λ

(130)

3.4.3. Example 3. Consider the normal distribution which has a density function given by

f(x ; µ, σ

√

2πσ

· e

−1

(

x−µ

)

(131)

Let g(x) = X - µ, where X is a normally distributed random variable with mean µ and variance

. Find the moment-generating function for (X - µ). This is the moment generating function for

central moments of the normal distribution.

(t)=E[e

t (X − µ)

√

2πσ

∞

−∞

t (x − µ)

−1

(

x−µ

)

dx (132)

To integrate, let u = x - µ. Then du = dx and

(t)=

√

2π

∞

−∞

− u

2σ

√

2π

∞

−∞

tu −

2 σ

√

2π

∞

−∞

[

2 σ

(

2 σ

tu − u

)]

√

2π

∞

−∞

exp



−1

2 σ



− 2σ

tu)



(133)

To simplify the integral, complete the square in the exponent of e. That is, write the second term

in brackets as



− 2σ





− 2σ

tu + σ

− σ



(134)

This then will give

exp



−1

2σ



− 2σ

tu)



= exp



−1

2σ



− 2σ

tu + σ

− σ

)



= exp



−1

2 σ



− 2σ

tu+ σ

)



·exp



−1

2σ



(−σ

)



= exp



−1

2σ



− 2σ

tu + σ

)



· exp





(135)

Now substitute equation 135 into equation 133 and simplify. We begin by making the substitu-

tion and factoring out the term exp

36 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

(t)=

√

2π

∞

−∞

exp



−1

2 σ



− 2σ

tu)



√

2π

∞

−∞

exp



−1

2 σ



− 2σ

tu+ σ

)



· exp





= exp





√

2π



∞

−∞

exp



−1

2 σ



− 2σ

tu+ σ

)



(136)

Now move

√

2π

inside the integral sign, take the square root of (u

− 2σ

tu+ σ

) and

simplify

(t) = exp





∞

−∞

exp



−1

2 σ



− 2σ

tu+ σ

)



√

2π

= exp





∞

−∞

exp



−1

2 σ



(u − σ

t )



√

2π

= e

∞

−∞

−1

u−σ

√

2π

(137)

The function inside the integral is a normal density function with mean and variance equal to

t and σ

, respectively. Hence the integral is equal to 1. Then

(t)=e

. (138)

The moments of u = x - µ can be obtained from M

(t) by differentiating. For example the ﬁrst

central moment is

E(X − µ )=







t =0

= tσ







t =0

(139)

The second central moment is

E(X − µ )







t =0



tσ







t =0





+ σ







t =0

= σ

(140)

The third central moment is

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 37

E(X − µ )







t =0





+ σ











+2tσ





+ tσ







t =0





+3tσ







t =0

(141)

The fourth central moment is

E(X − µ )







t =0





+3tσ







t =0





+3t





+3t





+3σ







t =0





+6t





+3σ







t =0

=3σ

(142)

3.4.4. Example 4. Now consider the raw moments of the normal distribution. The density function

is given by

f(x ; µ, σ

√

2πσ

· e

−1

(

x−µ

)

(143)

To ﬁnd the moment-generating function for X we integrate the following function.

(t)=E [e

√

2πσ

∞

−∞

−1

(

x−µ

)

dx (144)

First rewrite the integral as follows by putting the exponents over a common denominator.

38 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

(t)=E [e

√

2πσ

∞

−∞

−1

(

x−µ

)

√

2πσ

∞

−∞

−1

2 σ

(x−µ)

+ tx

√

2πσ

∞

−∞

−1

2 σ

( x−µ )

2 σ

√

2πσ

∞

−∞

−1

2 σ

[

(x −µ )

− 2 σ

]

(145)

Now square the term in the exponent and simplify

(t)=E[e

√

2πσ

∞

−∞

−1

2 σ

[

− 2 µx + µ

− 2 σ

]

√

2πσ

∞

−∞

−1

2 σ

[

− 2 x

(

µ + σ

)

+ µ

]

(146)

Now consider the exponent of e and complete the square for the portion in brackets as follows.

− 2x



µ + σ



+ µ

= x

− 2 x



µ + σ



+ µ

+2µσ

t + σ

− 2µσ

t − σ



− (µ + σ



− 2 µσ

t − σ

(147)

To simplify the integral, complete the square in the exponent of e by multiplying and dividing



2 µσ

t+σ

2 σ



−2 µσ

t − σ

2 σ



=1 (148)

in the following manner

(t)=

√

2πσ

∞

−∞

−1

2 σ

[

−2x

(

µ+σ

)

+ µ

]



2µσ

t+σ

2σ



√

2πσ

∞

−∞

−1

2σ

[

−2x

(

µ+σ

)

+µ

]



− 2µσ

t−σ

2 σ





2µσ

t+σ

2σ



√

2πσ

∞

−∞

−1

2σ

[

−2x

(

µ+σ

)

+µ

+2µσ

t+σ

]

(149)

Now ﬁnd the square root of

− 2x



µ + σ



+ µ

+2µσ

t + σ

(150)

Given we would like to have (x − something)

, try squaring x − (µ + σ

t) as follows



x − (µ + σ



= x

− 2



x(µ + σ





µ + σ



= x

− 2x



µ − σ



+ µ

+2µσ

t + σ

(151)



x − (µ + σ



is the square root of x

− 2x



µ − σ



+ µ

+2µσ

t + σ

. Making the

substitution in equation 149 we obtain

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 39

(t)=



2µσ

t+σ



√

2πσ

∞

−∞

−

[

−2x

(

µ+σ

)

+µ

+2µσ

t+σ

]



2µσ

t+σ



√

2πσ

∞

−∞

−1

([

x−(µ+σ

])

(152)

The expression to the right of e

2µσ

t+σ

2 σ

is a normal density function with mean and variance

equal to µ + σ

t and σ

, respectively. Hence the integral is equal to 1. Then

(t)=



2µσ

t+σ

2σ



= e

µt+

. (153)

The moments of X can be obtained from M

(t) by differentiating with respect to t. For example

the ﬁrst raw moment is

E(X)=



µt +





t =0

=(µ + tσ

)



µt+





t =0

= µ

(154)

The second raw moment is

E(x



µt +





t=0





µ + tσ





µt+





t=0





µ + tσ





µt+



+ σ



µt+





t =0

= µ

+ σ

(155)

The third raw moment is

E(X



µt+





t=0





µ + tσ





µt+



+ σ



µt+





t =0



µ + tσ





µ+



+2σ



µ + tσ





µ+



+ σ



µ + tσ





µ+

i



t=0





µ + tσ





µ+



+3σ



µ + tσ





µ+





t =0

= µ

+3σ

(156)

The fourth raw moment is

40 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

E(X



µ+





t=0





µ + tσ





µ+



+3σ



µ + tσ





µ+





t=0





µ + tσ





µ+



+3σ



µ + tσ





µ+





t=0



3σ



µ + tσ





µ+



+3σ



µ+





t=0





µ + tσ





µ+



+6σ



µ + tσ





µ+



+3σ



µ+





t=0

= µ

+6µ

+3σ

(157)

4. CHEBYSHEV’S INEQUALITY

Chebyshev’s inequality applies equally well to discrete and continuous random variables. We

state it here as a theorem.

4.1. A Theorem of Chebyshev.

Theorem 10. Let X be a random variable with mean µ and ﬁnite variance σ

. Then, for any constant k>0,

P (|X − µ| <kσ) ≥ 1 −

or P (|X − µ|≥kσ) ≤

. (158)

The result applies for any probability distribution, whether the probability histogram is bell-

shaped or not. The results of the theorem are very conservative in the sense that the actual proba-

bility that X is in the interval µ ± kσ usually exceeds the lower bound for the probability, 1 − 1/k

by a considerable amount.

Chebyshev’s theorem enables us to ﬁnd bounds for probabilities that ordinarily would have to

be obtained by tedious mathematical manipulations (integration or summation). We often can ob-

tain estimates of the means and variances of random variables without specifying the distribution

of the variable. In situations like these, Chebyshev’s inequality provides meaningful bounds for

probabilities of interest.

Proof. Let f

(x) denote the density function of X. Then

V (X)=σ

∞

−∞

(x − µ)

f (x) dx

µ −kσ

−∞

(x − µ)

(x) dx

µ + kσ

µ − kσ

(x − µ )

(x) dx

∞

µ + kσ

(x − µ )

(x) dx.

(159)

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 41

The second integral is always greater than or equal to zero.

Now consider relationship between (x − µ )

and kσ

x ≤ µ − kσ

⇒−x ≥ kσ − µ

⇒ µ − x ≥ kσ

⇒ ( µ − x )

≥ k

⇒ ( x − µ )

≥ k

(160)

And similarly,

x ≥ µ + kσ

⇒ x − µ ≥ kσ

⇒ ( x − µ )

≥ k

(161)

Now replace (x − µ)

with kσ

in the ﬁrst and third integrals of equation 159 to obtain the

inequality

V (X)=σ

≥

µ − kσ

−∞

(x) dx +

∞

µ + kσ

(x) dx . (162)

Then

≥ k

µ − kσ

−∞

(x) dx +

+∞

µ + kσ

(x) dx

(163)

W can write this in the following useful manner

≥ k

{P ( X ≤ µ − kσ)+P ( X + ≥ µ + kσ)}

= k

P ( |X − µ |≥kσ).

(164)

Dividing by k

, we obtain

P ( |X − µ |≥kσ ) ≤

, (165)

or, equivalently,

P ( |X − µ | <kσ) ≥ 1 −

. (166)



4.2. Example. The number of accidents that occur during a given month at a particular intersec-

tion, X, tabulated by a group of Boy Scouts over a long time period is found to have a mean of 12

and a standard deviation of 2. The underlying distribution is not known. What is the probability

that, next month, X will be greater than eight but less than sixteen. We thus want P [8 <X<16].

We can write equation 158 in the following useful manner.

P [(µ − kσ) <X< ( µ + kµ)] ≥ 1 −

(167)

42 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

For this problem µ = 12 and σ =2soµ − kσ = 12 - 2k. We can solve this equation for the k that

gives us the desired bounds on the probability.

µ − kµ =12− (k) (2) = 8

⇒ 2 k =4

⇒ k =2

and

12 + ( k )(2) =16

⇒ 2 k − 4

⇒ k =2

(168)

We then obtain

P [(8) <X< (16) ] ≥ 1 −

=1−

(169)

Therefore the probability that X is between 8 and 16 is at least 3/4.

4.3. Alternative statement of Chebyshev’s inequality.

Theorem 11. Let X be a random variable and let g(x) be a non-negative function. Then for r>0,

P [g(X) ≥ r] ≤

Eg(X)

(170)

Proof.

Eg(X)=

∞

−∞

g (x) f

(x) dx

≥

[x : g (x) ≥ r]

g (x) f

(x) dx (g is nonnegative)

≥ r

[ x : g(x) ≥ r ]

(x) dx (g (x) ≥ r)

= rP [ g (X) ≥ r ]

⇒ P [g ( X) ≥ r ] ≤

Eg(X)

(171)



4.4. Another version of Chebyshev’s inequality as special case of general version.

Corollary 1. Let X be a random variable with mean µ and variance σ

. Then for any k>0 or any

ε>0

P [ |X − µ|≥kσ] ≤

(172a)

P [ |X − µ|≥ε ] ≤

(172b)

RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS 43

Proof. Let g(x) =

(x−µ)

, where µ = E(X) and σ

= Var (X). Then let r = k

. Then



( X − µ )

≥ k



≤



(X − µ )



E (X − µ )

(173)

because E(X − µ)

= σ

. We can then rewrite equation 173 as follows



(X − µ)

≥ k



≤

⇒ P



( X − µ )

≥ k



≤

⇒ P [|X − µ |≥kσ] ≤

(174)



44 RANDOM VARIABLES AND PROBABILITY DISTRIBUTIONS

REFERENCES

[1] Amemiya, T. Advanced Econometrics. Cambridge: Harvard University Press, 1985.

[2] Bickel P.J., and K.A. Doksum. Mathematical Statistics: Basic Ideas and Selected Topics, Vol 1). 2

Edition. Upper Saddle

River, NJ: Prentice Hall, 2001.

[3] Billingsley, P. Probability and Measure. 3rd edition. New York: Wiley, 1995.

[4] Casella, G. And R.L. Berger. Statistical Inference. Paciﬁc Grove, CA: Duxbury, 2002.

[5] Cramer, H. Mathematical Methods of Statistics. Princeton: Princeton University Press, 1946.

[6] Goldberger, A.S. Econometric Theory. New York: Wiley, 1964.

[7] Lindgren, B.W. Statistical Theory 4

edition. Boca Raton, FL: Chapman & Hall/CRC, 1993.

[8] Rao, C.R. Linear Statistical Inference and its Applications. 2nd edition. New York: Wiley, 1973.

[9] Theil, H. Principles of Econometrics. New York: Wiley, 1971.