Notes on the MCS 341 text*
- Chapter 1: What is statistics?
- p.2: "The large body of data that is the target of our interest
is called a population, and the subset selected from it is a
sample."
These definitions describe the population and sample as data.
An alternative perspective is offered by the MCS 142 text,
IPS 4/e p. 248: "The entire group of individuals that we want
information about is called the population.
A sample is a part of the population that we actually examine
in order to gather information."
In this perspective, the numerical data arise from variables whose
values are observed or measured for each individual.
Cf. the spreadsheet view of data.
- p. 4: Regarding histograms, the text reads,
"A rectangle is constructed over each interval, such that the
height of the rectangle is proportional to the
fraction of the total number of measurements falling in
each cell."
The word "height" (not underlined in the original) should be replaced
by the word "area".
- pp. 4-5 give guidelines for the construction of histograms.
The intervals are sometimes called "bins."
An alternative to the suggestion of using 5-20 intervals is the
suggestion of using about n1/2 bins,
where n is the number of data.
- p. 8: Means other than the arithmetic mean are certainly useful
and worth learning, as demonstrated in class.
- p. 9: In Definition 1.2, the variance of
y1, y2, ..., yn
considered as a population should be calculated by using
n in the denominator rather than n-1.
- p. 10: The "empirical rule" is really the rule for a normal
distribution. For a normal distribution, 99.7% of the values
lie within 3 standard deviations of the mean.
- Chapter 2: Probability
- p. 22: Note the subset symbol used in the text.
- p. 26: "The letter E with a subscript
will be used to denote a simple event or the corresponding sample
point."
Thus, a casual (sloppy?) identification is made between an
element Ei
and the set {Ei} containing the element.
- p. 27: The text speaks of "either a finite or countable number."
This seems to indicate that "countable" will mean
"countably infinite."
But for probability theory this is seldom a useful distinction,
because of the countable additivity postulate of probability.
I'll follow the practice of other probabilists in using the
word "countable" to mean "in 1-1 correspondence with the
set of counting numbers {1, 2, 3, ...} or a subset thereof."
Thus a
countable set could be finite or countably infinite.
This usage allows more economical phrasing.
- p. 52: Events A and B are independent
iff P(A and B) = P(A)P(B).
Events A and B are independent
if P(A|B) = P(A) or P(B|A) = P(B).
The definition of independent events given in the text
is not adequate for three or more events.
See Exercise 2.145 and the handout "Independent Events" or the
Word file.
- p. 64: You should memorize the formula for the sum of a
geometric series.
- pp. 67ff.: Bayes's rule
According to our style manual (Lunsford, The Everyday
Writer, 332), "Add an apostrophe and -s to form the
possessive
of most singular nouns, including those that end in -s ..."
- p. 75: Definition 2.13: What our text calls a "random sample"
was called a "simple random sample" (SRS) in MCS 140/142.
- Chapter 3: Discrete random variables and
their probability distributions
- p. 90: Theorem 3.2 is sometimes called the
"Law of the Unconscious Statistician."
I may abbreviate it "LotUS."
- Section 3.3: Learn the properties of expected values.
Note that expectation is a linear operator:
E(aX + bY) = aE(X) + bE(Y) if a and b are constants and X and Y
are random variables.
Unfortunately, our text does not deal with V(aX + bY)
until Section 5.8.
V(aX + bY) = a2V(X) + 2ab Cov(X,Y) + b2V(Y).
If X and Y are independent random variables, then the covariance
term is zero (Cov(X,Y) = 0) and
V(aX + bY) = a2V(X) + b2V(Y).
- pp. 105-106: Example 3.10 offers a preview of a very important idea
in mathematical statistics: maximimum likelihood estimation
of parameters.
- p.125: The first three "equations" on this page are not really
equalities, but approximations.
Here's a better formulation of the Poisson process:
(1) Assume P(one incident occurs in a subinterval of length 1/n)
= p where p is approximately proportional to 1/n:
p = L/n + o(1/n). Here "L" (for "lambda") is a constant,
and "o(h)" ("little o of h") denotes some function of h that
approaches 0 faster than h when h approaches 0, i.e.,
o(h)/h --> 0 as h --> 0.
(2) Assume that
P(more than one incident occurs in a subinterval of length 1/n)
= o(1/n). (It's very small, but it's not exactly 0.)
(3) Assume also that the occurrences of incidents in disjoint intervals
are independent.
Let Y denote the number of incidents in the entire length-1 interval.
Then an adjustment of the argument on p. 125 shows
that P(Y = y) = e-L Ly/y!.
- Generalization: For an interval of length T and
L = average rate of incidents per unit length, the probability of
y incidents in the interval is e-LT (LT)y/y!.
- Chapter 4: Continuous random variables
and their probability distributions
- p. 151: A common abbreviation for (cumulative) distribution
function is "cdf".
- p. 153: Theorem 4.1: Note the footnote on right continuity.
The four properties stated in the theorem and the footnote
are characterizing properties of distribution functions:
The cdf of every random variable satisfies these properties,
and, given any function F satisfying these four properties,
there exists a random variable Y for which
P(Y < y) = F(y) for every y.
- p. 154: Definition 4.2: Footnote 2 is not right.
Definition 4.2 is perfectly good as it stands.
- p. 154: A common abbreviation for probability density function
is "pdf".
- p. 154: Definition 4.3: Actually a probability density
function is generally defined to be a nonnegative function
for which Theorem 4.3 is true,
or equivalently, for which the p. 154 formula
F(y) = integral of f(t) dt from negative infinity
to y is true for every real value y.
Then it follows from the Fundamental Theorem of Calculus
that F'(y) = f(y), at least where f is continuous.
- p. 154: Notice that a pdf could be redefined at finitely many
points (or even more) without changing the values of integrals of
the pdf, so strictly speaking, pdf's are not unique.
- p. 154: The footnote confuses continuous random variables
with random variables having a pdf.
The fact of the matter is that there are continuous random variables
that do not have probability density functions.
- p. 156: Iverson notation is handy for formulas
involving cases:
[statement] = 1 if statement is true;
[statement] = 0 if statement is false.
In Example 4.3 we could write
f(y) = 3y2
[0 < y < 1].
- p. 163: Theorem 4.4 is the continuous pdf version of the
Law of the Unconscious Statistician.
- p. 163: Note the definition (it's not highlighted):
Variance of Y is
V(Y) = E[{Y - E(Y)}2].
- p. 167: In Iverson notation, the pdf for the uniform
distribution on the interval (a, b) is given by
f(y) = [a < y < b]/(b - a).
Whether or not the endpoints are included is not important:
integrals of the pdf will not be affected.
(See the p. 154 note.)
-
p. 188 describes situations modeled well by exponential and gamma
distributions.
-
p. 189: The moment-generating function m(t)
of a continuous r.v. having pdf f is the same as the
Laplace transform of f evaluated at
s = -t.
-
p. 190: The second paragraph makes explicit a technique for
evaluating certain definite integrals by relating them to
known pdf's (hence, to functions whose integral over the reals
is known to be 1).
-
Other useful continuous probability distributions:
- p. 204 #4.148: log-normal
- p. 206 #4.152: Weibull
- p. 207 #4.158: Maxwell
- p. 208 #4.164: Markov's inequality
(cf. Tchebysheff's inequality)
- Chapter 5: Multivariate probability
distributions
- pp. 213-214: Highlight the sentence before Definition 5.3;
it gives a correct definition of "jointly continuous":
"Two random variables are said to be jointly continuous if their
joint distribution function
F(y1, y2)
is continuous in both arguments."
Then in the first line on top of p. 214 strike out the words
"said to be".
The fact of the matter is that there are jointly continuous
random variables that do not have a joint probability density
function.
Thus, the set of pairs of r.v.s having a joint pdf is a proper subset
of the set of pairs of r.v.s that are jointly continuous.
- p. 214, Theorem 5.2 & following: Note that the expression
in point 3 gives a formula for the probability that
(Y1,Y2) is in the rectangle
(y1, y*1] x
(y2, y*2].
-
p. 228, Definition 5.7: The notations
"f(y1 | y2)"
and
"f(y2 | y1)"
cannot be distinguished if actual numbers are substituted for
y1 and y2;
e.g., what is f(3 | 4)?
Is it f(3,4)/f1(4)
or f(4,3)/f2(4)?
One way to remedy this is to use subscripts telling you what
random variables are involved. It's cumbersome, but clarifying.
E.g., "f(y1 | y2)"
would be written
"fY1|Y2
(y1 | y2)."
-
p. 234, Theorem 5.4: The second paragraph needs fixing.
Change "Y1 and Y2 are
independent if and only if
f(y1,y2) =
f1(y1)
f2(y2)
for all pairs of real numbers
(y1,y2)"
either by deleting "and only if" or by inserting the word
"almost" before "all pairs."
The problem is that pdf's that differ at a number of points
are equivalent with respect to integration.
"Almost all pairs" means all pairs except possibly a set of
area zero.
-
p. 236: Theorem 5.5 provides a useful test for independence.
-
p. 241: "Definition" 5.9 reveals our authors to be "unconscious
statisticians"! This is really a theorem, the Law of the
Unconscious Statistician.
-
p. 242: The example at least reveals consciousness of the
need to check consistency between "Definition" 5.9 and the
original definition of expected value.
-
p. 255: This exercise shows by a counterexample
that uncorrelatedness does not imply independence.
Also pp. 252-253 provide a discrete counterexample.
-
pp. 259-261: Example 5.29 calculates the mean and variance of
a hypergeometric r.v.
-
p. 270: "Definition 5.13," like Definition 5.9, mixes the
definition of conditional expected value with a
theorem, the Law of the Unconscious Statistician for
conditional expectations.
-
pp. 271-272: Example 5.32: An interesting feature of this
example is that it is a mix of discrete and continuous r.v.s.
The formula "E(Y) = E[E(Y|p)]" still holds, even though the
proof of Theorem 5.14 does not cover this case.
-
p. 272: Alternatively, the conditional variance may be defined
by V(Y1 | Y2) =
E({Y1 - E(Y1 | Y2)}2
| Y2).
- Chapter 6: Functions of random
variables
-
A missing result: If X and Y are independent
r.v.s with respective pdfs f and g,
then the pdf of their sum U = X + Y
is given by f * g
the convolution of f and g,
defined by
(f * g)(u) = integral from -infinity to infinity
of f(u-y)g(y) dy.
Special case:
When X and Y are nonnegative random variables,
so that their pdfs are zero for negative inputs,
(f * g)(u) = integral from 0 to u
of f(u-y)g(y) dy when u > 0.
-
Also missing: The formulas for the pdfs of X - Y,
XY, and Y / X.
The last case was given in class.
-
pp. 284-288: Example 6.3: You should understand this derivation
of the pdf of Y1 + Y2
when Y1 and Y2 are
independent r.v.s uniformly distributed on [0,1]
and also contrast it with the convolution formula
fY1 + Y2(y) =
integral from -infinity to infinity of
[0 < u-y < 1]
[0 < y < 1] dy.
-
pp. 290-291: The result stated in the last paragraphs may be
regarded as "The Fundamental Theorem of Simulation."
Here's the general theorem: Let F be any cdf.
For 0 < u < 1, let G be the "left-continuous
inverse" of F:
G(u) =
inf{ x | F(x) > u}.
Let the r.v. U be uniformly distributed on (0,1).
Then the r.v. Y := G(U) has F
as its cdf.
-
pp. 294-297: Also assume that h-1 is a
differentiable function.
-
p. 297: Easy-to-remember differential form of the method of
transformation:
fU(u) du =
fY(y) dy
where u = h(y)
and y = h-1(u).
-
p. 313: In the formula for f(y2),
change y1 to y2.
-
p. 317: An alternative notation for the kth
order statistic Y(k) is
Yk:n.
- Chapter 7: Sampling distributions
and the Central Limit Theorem
-
The Central Limit Theorem (CLT) is one of the top ten theorems
of all time.
-
p. 348: Of course, the variance
V(Yi) must be positive, too.
-
p. 352: Theorem 7.5 is known as the continuity theorem for
mgfs.
-
Section 7.4: The proof given here is not completely general,
because not all distributions satisfying the hypotheses of the
CLT have moment generating functions, i.e., ones defined
over nondegenerate intervals.
-
pp. 362-363: Exercises 7.72 and 7.73 give formulas for the
t and F densities.
*Mathematical Statistics with Applications, 6/e,
Dennis D. Wackerly, William Mendenhall III, and Richard L. Scheaffer,
Duxbury, 2002.