Estimation, Margins of Error and Confidence Intervals

EC031-S26

Author

Aleksandr Michuda

Agenda

The properties of point estimators
What is an interval estimator?
What is a margin of error?
What is a confidence interval?
Why is that necessary in inference?

Estimators

An estimator is a function of the sample data that is used to estimate a population parameter.
A point estimator is a single value that is used to estimate a population parameter.
An interval estimator is an interval in which the population parameter is likely to fall.

What makes an estimator “good?”

There are often many different estimators for a single population parameter.

Finite Sample Properties

Unbiasedness: Does the estimator deliver the “right” answer on average (if we were to repeat the experiment many times)?

\[ E(\hat{\theta}) = \theta \]

Efficiency: Does the estimator have the smallest variance of all possible estimators?

Asymptotic Properties

Asymptotically unbiased: Does the estimator converge to the true value as the sample size increases?
Consistent: As \(n \rightarrow \infty\), the estimator converges in probability to the true value.
Asymptotically efficient: Does the estimator have the smallest variance of all possible estimators as \(n \rightarrow \infty\)?

What makes an estimator “good?”

There are many estimators for a single population parameter.

Text(0.5, 1.0, '$\\hat{\\theta} = \\bar{X}$ (mean: 0.022)')

Unbiasedness

\[ E(\hat{\theta}) = \theta \]

The estimator produces estimates that are centered around the true population parameter.
If repeatedly sampled, and calculated the estimator, the average of those estimates would be the true population parameter.

Is the sample mean an unbiased estimator of the population mean?

Recall that the expectation is a linear operator.

\[ E(\bar{X}) = E(\frac{1}{n} \sum_{i=1}^{n} X_i) = \frac{1}{n} \sum_{i=1}^{n} E(X_i) = \frac{1}{n} \sum_{i=1}^{n} \mu = \mu \]

What about the other estimators we talked about?

\[ E(\hat{\theta}) = E(\frac{X_1 + X_2}{2}) \]

Note

\[ E(\hat{\theta}) = E(\frac{X_1 + X_2}{2}) = \frac{1}{2}E(X_1) + \frac{1}{2}E(X_2) = \mu \]

\[ E(\hat{\theta}) = E(\frac{X_1 + X_2 + 2X_3}{4}) \]

Note

\[ E(\hat{\theta}) = E(\frac{X_1 + X_2 + 2X_3}{4}) = \frac{1}{4}E(X_1) + \frac{1}{4}E(X_2) + \frac{1}{2}E(X_3) = \mu \]

What about one more?

\[ E(\hat{\theta}) = E(\frac{X_1 + X_2 + 3X_3}{4}) \]

Note

\[ E(\hat{\theta}) = E(\frac{X_1 + X_2 + 3X_3}{4}) = \frac{1}{4}E(X_1) + \frac{1}{4}E(X_2) + \frac{3}{4}E(X_3) = \frac{5}{4} \mu \]

Biased!

Efficiency

An estimator is efficient if it has the smallest variance of all possible estimators.
The variance of an estimator is a measure of how much the estimates from the estimator vary from the true population parameter.

What’s the variance of all of our estimators?

Variance of the Sample Mean

We already know that the variance of \(\bar{X} = \frac{\sigma^2}{n}\).
Let’s actually calculate that
For this we need two know two properties of the variance:

Variance of the Sample Mean

\[ Var(\bar{X}) = Var(\frac{1}{n} \sum_{i=1}^{n} X_i) = \frac{1}{n^2} \sum_{i=1}^{n} Var(X_i) = \frac{1}{n^2} \sum_{i=1}^{n} \sigma^2 = \frac{\sigma^2}{n} \]

What assumption are we making between the \(X_i\)?

Variance of the Other Estimators

\[ Var(\hat{\theta}) = Var(\frac{X_1 + X_2}{2}) \]

Note

\[ Var(\hat{\theta}) = Var(\frac{X_1 + X_2}{2}) = \frac{1}{4}Var(X_1) + \frac{1}{4}Var(X_2) = \frac{\sigma^2}{2} \]

\[ Var(\hat{\theta}) = Var(\frac{X_1 + X_2 + 2X_3}{4}) \]

Note

\[ Var(\hat{\theta}) = Var(\frac{X_1 + X_2 + 2X_3}{4}) = \frac{1}{16}Var(X_1) + \frac{1}{16}Var(X_2) + \frac{4}{16}Var(X_3) = \frac{9\sigma^2}{16} \]

Variance of the Other Estimators

\[ Var(\hat{\theta}) = Var(\frac{X_1 + X_2 + 3X_3}{4}) \]

Note

\[ Var(\hat{\theta}) = Var(\frac{X_1 + X_2 + 3X_3}{4}) = \frac{1}{16}Var(X_1) + \frac{1}{16}Var(X_2) + \frac{9}{16}Var(X_3) = \frac{11\sigma^2}{16} \]

Which is lowest?

Below is a table of all of the variances we calculated:

Estimator	Variance
\(\bar{X}\)	\(\frac{\sigma^2}{n}\)
\(\frac{X_1 + X_2}{2}\)	\(\frac{\sigma^2}{2}\)
\(\frac{X_1 + X_2 + 2X_3}{4}\)	\(\frac{9\sigma^2}{16}\)
\(\frac{X_1 + X_2 + 3X_3}{4}\)	\(\frac{11\sigma^2}{16}\)

For the same \(n\), the mean is always the lowest variance estimator!

How do you choose between estimators?

Sometimes, though, it isn’t obvious how to choose
One estimator is biased but has lower variance
Another is unbiased but high variance
Often to decide, we specify a loss function
- A function that maps deviations of the estimator from the true value to a loss (\(\hat{\theta}\) and \(\theta\))
Then choose the estimator that minimizes the loss function

MSE

A very common loss function is Mean Squared Error (MSE)
The MSE of an estimator is the expected value of the squared difference between the estimator and the true value of the parameter

\[ MSE(\hat{\theta}) = E((\hat{\theta} - \theta)^2) \]

If you do some math, you can find that the MSE can be decomposed into two parts:

\[ MSE(\hat{\theta}) = Var(\hat{\theta}) + Bias(\hat{\theta})^2 \]

Note

\[ MSE(\hat{\theta}) = E((\hat{\theta} - \theta)^2) \]

\[ = E((\hat{\theta} - E(\hat{\theta}) + E(\hat{\theta}) - \theta)^2) \]

\[ = E((\hat{\theta} - E(\hat{\theta}))^2) + E(E(\hat{\theta}) - \theta)^2 + 2E((\hat{\theta} - E(\hat{\theta}))(E(\hat{\theta}) - \theta)) \]

\[ = Var(\hat{\theta}) + Bias(\hat{\theta})^2 \]

Margin of Error and Interval Estimate

A point estimator always comes with some measure of error
An interval estimate can be computed by adding and subtracting the margin of error to the point estimate

\[ \text{Estimate} \pm \text{Margin of Error} \]

Its purpose is to provide a range of values that is likely to contain the population parameter.

An interval estimate of \(\mu\)

In order to do this, we need to set a “significance level” \(\alpha\).
This is the probability \(\mu\) falls outside the interval
The confidence level is \(1 - \alpha\)
This will also be known as Type I error
This interval will then be known as a “confidence interval”

Two-sided Interval Estimates of the Population Mean

Let’s use the sample mean to estimate the population mean and then construct a confidence interval around it
We use the fact that \(\bar{X}\) is normally distributed with mean \(\mu\) and variance \(\frac{\sigma^2}{n}\)
Meaning that the z-score of \(\bar{X}\) is \(\frac{\bar{X} - \mu}{\frac{\sigma}{\sqrt{n}}} \sim N(0,1)\)

The question we want to answer is, what do the lower and upper bounds of the confidence interval have to be in order for me to be \(1-\alpha\) confident that the true population mean is within this interval?

\[ P(LB \leq \mu \leq UB) = 1 - \alpha \]

Since we know that \(z\) is a standard normal distribution, we can use the quantiles of the standard normal distribution to find the bounds.

So we want to find a symmetric interval around the sample mean such that:

\[ P(-z^* \leq z \leq z^*) = 1 - \alpha \]

We can get a symmetric bound by just taking \(\alpha/2\) on each side.

Once I find \(z^*\), I can define the probability fully:

\[ P(-z^*_{\alpha/2} \leq \frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}} \leq z^*_{\alpha/2}) = 1 - \alpha \]

Important

In this case, we are going to assume that we know \(\sigma\)!

The confidence interval

\(z^*\) is called the critical value

Now, with some manipulation, we can find the confidence interval:

\[ P(-z^*_{\alpha/2} \leq \frac{\bar{x}-\mu}{\frac{\sigma}{\sqrt{n}}} \leq z^*_{\alpha/2}) = 1 - \alpha \]

\[ P(-z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq \bar{x}-\mu \leq z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}}) = 1 - \alpha \]

\[ P(\bar{x} - z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}} \leq \mu \leq \bar{x} + z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}}) = 1 - \alpha \]

\[ CI = \bar{x} \pm z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}} \]

Choosing \(\alpha\)

Code

jStat = require("https://cdn.jsdelivr.net/npm/jstat@latest/dist/jstat.min.js");

//Create array for x-axis values
viewof alpha = Inputs.range([0.01, 0.99], {
    step: 0.01,
    value: 0.95,
    label: "Confidence Level (1-α)"
})

function normalPDF(x) {
    return (1 / (Math.sqrt(2 * Math.PI))) * Math.exp(-0.5 * ((x) / 1) ** 2);
}


zScore = -1 * jStat.normal.inv((1-alpha)/2, 0, 1)

Plot.plot({
    width: 800,
    height: 400,
    x: {
        domain: [-4, 4]
    },
    y: {
        domain: [0, 0.45]
    },
    marks: [
        Plot.line(d3.range(-4, 4, 0.01).map(x => ({x, y: normalPDF(x)})), {x: "x", y: "y"}),
        Plot.areaY(d3.range(-4, -zScore, 0.01).map(x => ({x, y: (1 / (1 * Math.sqrt(2 * Math.PI))) * Math.exp(-0.5 * ((x - 0) / 1) ** 2)})), {x: "x", y: "y", fillOpacity: 0.3}),
        Plot.areaY(d3.range(zScore, 4, 0.01).map(x => ({x, y: (1 / (1 * Math.sqrt(2 * Math.PI))) * Math.exp(-0.5 * ((x - 0) / 1) ** 2)})), {x: "x", y: "y", fillOpacity: 0.3}),
        Plot.ruleX([-zScore, zScore])
    ]
})

Example

Suppose \(x\) is a random variable reflecting the hours of sleep a student gets per week. \(x\) has an unknown \(\mu\) and a know \(\sigma\) of 16.

Suppose you take a random sample of 64 students and they averaged 45 hours of sleep per week.

Construct a 90% confidence interval for \(\mu\)

Answer

\[ CI = \bar{x} \pm z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}} \]

\[ CI = 45 \pm 1.645 \frac{16}{\sqrt{64}} \]

\[ CI = 45 \pm 3.29 \]

Note that as \(n\) increases, the confidence interval decreases!

One-sided Confidence Intervals

Sometimes we only care about getting an upper of lower bound on \(\mu\).
In this case, we are searching for a \(z^*\) such that:

\[ P(z \leq z^*) = 1 - \alpha \]

So our critical value is just all that are one side.

Our confidence interval is then:

\[ [\bar{x} - z^*_{\alpha} \frac{\sigma}{\sqrt{n}}, \infty] \]

\[ [-\infty, \bar{x} + z^*_{\alpha} \frac{\sigma}{\sqrt{n}}] \]

Example

Suppose \(x\) is a random variable reflecting the hours of sleep a student gets per week. \(x\) has an unknown \(\mu\) and a know \(\sigma\) of 16. Suppose you take a random sample of 64 students and they averaged 45 hours of sleep per week. Construct a 90% right-sided confidence interval for \(\mu\).

Answer

\[ CI = 45 + 1.28 \frac{16}{\sqrt{64}} \]

\[ CI = [45 - 2.56, \infty] \]

The \(\sigma\) problem

But I’ve been making it very clear, that we almost never know \(\sigma\).
So what do we do?
Instead, we use the sample standard deviation \(s\) as an estimate for \(\sigma\)

\[ s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2 \]

Why this?

Because we know that \(s^2\) is an unbiased estimator of \(\sigma^2\)!

Note

Show that \(E(s^2) = \sigma^2\)

\[ E(s^2) = E(\frac{1}{n-1} \sum_{i=1}^{n} (X_i - \bar{X})^2) \]

\[ = \frac{1}{n-1} \sum_{i=1}^{n} E((X_i - \bar{X})^2) \]

\[ = \frac{1}{n-1} \sum_{i=1}^{n} E((X_i - \mu + \mu - \bar{X})^2) \]

\[ = \frac{1}{n-1} \sum_{i=1}^{n} E((X_i - \mu)^2 + 2(X_i - \mu)(\mu - \bar{X}) + (\mu - \bar{X})^2) \]

\[ = \frac{1}{n-1} \sum_{i=1}^{n} E((X_i - \mu)^2) + 2E(X_i - \mu)E(\mu - \bar{X}) + E(\mu - \bar{X})^2 \]

Finish this…

The \(t\) distribution

But notice that in the switch from \(\sigma\) to \(s\), we go from a parameter to an estimate.
When we switch from a parameter to an estimate, we lose a degree of freedom.

Tip

The statistic using \(s\) is actually a ratio of a Normally distributed random variable and a \(\chi^2\) distributed random variable. This ratio is called the \(t\) distribution.

So rather than using a normal distribution to calculate critical values, we need to use a t-distribution with \(n-1\) degrees of freedom.

The \(t\) distribution

The t-distribution first described by William Sealy Gosset in 1908.
Worked for the Guinness Brewery in Dublin, Ireland, and was interested in the quality control of stout.
Gosset discovered the t-distribution while working on small samples of data.
- He was unable to use the normal distribution because it requires the population standard deviation, which is unknown in most cases.
He later published his work under the pseudonym “Student” because Guinness did not allow employees to publish research.

The \(t\) distribution

Z-tables and t-tables

T-TABLE

Proportions

We can also construct confidence intervals for proportions
A proportion is a ratio of the number of successes to the total number of trials
The sample proportion is \(\bar{p} = \frac{\sum_i x_i}{n}\)
- Where \(x_i\) is a set of 1s and 0s, or successes and failures coming from a Bernoulli distribution with parameter \(p\).
The standard error of the sample proportion is \(\sqrt{\frac{p(1-p)}{n}}\)
Since the population proportion is
- If \(np > 5\) and \(n(1-p) > 5\)

Proportions

We can show that \(E(\bar{p})=p\) and \(Var(\bar{p}) = \frac{p(1-p)}{n}\)

\[ E(\bar{p})=E(\frac{\sum_i x_i}{n}) = \frac{1}{n}\sum_i E(x_i) = \frac{1}{n} np = p \]

Example

Political Science Inc. (PSI) specializes in voter polls and surveys designed to keep political office seekers informed of their position in a race.

Using telephone surveys, PSI interviewers ask registered voters who they would vote for if the election were held that day.

In a current election campaign, PSI has just found that 220 registered voters, out of 500 contacted, favor a particular candidate. PSI wants to develop a 95% confidence interval estimate for the proportion of the population of registered voters that favor the candidate.

Example

The confidence interval is:

\[ CI = \bar{p} \pm z^*_{\alpha/2} \sqrt{\frac{\bar{p}(1-\bar{p})}{n}} \]

\[ CI = 0.44 \pm 1.96 \sqrt{\frac{0.44(1-0.44)}{500}} \]

\[ CI = 0.44 \pm 0.044 \]

Computing the necessary sample size for a given margin of error

Suppose we want to estimate the population mean with a margin of error of \(m\)
We can use the formula for the margin of error:

\[ m = z^*_{\alpha/2} \frac{\sigma}{\sqrt{n}} \]

Meaning that we can solve for \(n\):

\[ n = (\frac{z^*_{\alpha/2} \sigma}{m})^2 \]

This is what’s called a power analysis, which is very popular when designing experiments.

Again we have the \(\sigma\) problem, however, it is in this case, that the only way to get around it is to use some value that makes sense based on the context.

You might:

Use the estimate of the population standard deviation computed in a previous study.
Use a pilot study to select a preliminary sample and use the sample standard deviation from the study.
Use judgment or a “best guess” for the value of \(\sigma\).

Other Formats

Agenda

Estimators

What makes an estimator “good?”

What makes an estimator “good?”

Unbiasedness

Is the sample mean an unbiased estimator of the population mean?

What about the other estimators we talked about?

What about one more?

Efficiency

Variance of the Sample Mean

Variance of the Sample Mean

Variance of the Other Estimators

Variance of the Other Estimators

Which is lowest?

How do you choose between estimators?

MSE

Margin of Error and Interval Estimate

An interval estimate of \(\mu\)

Two-sided Interval Estimates of the Population Mean

The confidence interval

Choosing \(\alpha\)

Example

One-sided Confidence Intervals

Example

The \(\sigma\) problem

The \(t\) distribution

The \(t\) distribution

The \(t\) distribution

Z-tables and t-tables

Proportions

Proportions

Example

Example

Computing the necessary sample size for a given margin of error