EC031-S26
Note: The problems may differ based on the edition of the textbook you have.
Briefly explain the difference between \(b_1(\text{OLS})\) and \(\beta_1\); between the residual, \(e_i\), and the regression error, \(\epsilon_i\); and between the OLS predicted value, \(\hat{y_i}\) and \(E(Y_i|X)\).
Solution
\(\beta_1\) is the true, population-level slope parameter representing the change in the expected value of \(y\) associated with a 1 -unit increase in \(x\), while \(b_1\) is the estimate of \(\beta_1\) from an OLS regression of sample values of \(y_i\) on \(x_i\).
The residual, \(e_i\), is the difference between the actual observed value of \(y_i\) and the \(y\)-value the regression estimates would predict for that individual or observation, based on their/its value of \(x\). (That predicted value is \(\hat{y}_i=\mathrm{b}_0+b_1 x_i\).) The regression error term, \(\epsilon_i\), reflects the fact that the population-level relationship between \(x\) and \(y\) is not fully deterministic. Said differently, it represents the unobserved random component of \(y\) that is not captured or explained by variation in \(x\).
The OLS predicted value, \(\hat{y}_i\) is the value of \(y\) that the regression estimates would predict for a given value, \(x_i . E\left(y_i \mid x\right)\) is the population-level average, or expected value, of \(y\) conditional on \(x\) (i.e., for a given value of \(x\) ).
ASW 10.38
Solution
\[ H_0: \mu_1-\mu_2=0 \]
\[ H_{\mathrm{a}}: \mu_1-\mu_2 \neq 0 \]
\[ z=\frac{\left(\bar{x}_1-\bar{x}_2\right)-D_0}{\sqrt{\frac{\sigma_1^2}{n_1}+\frac{\sigma_2^2}{n_2}}}=\frac{(4.1-3.4)-0}{\sqrt{\frac{(2.2)^2}{120}+\frac{(1.5)^2}{100}}}=2.79 \]
\[ p \text {-value }=2(1.0000-.9974)=.0052 \]
\(p\)-value \(\leq .05\), reject \(H_0\). A difference exists with system B having the lower mean checkout time.
ASW 10.45
Solution
\[ \begin{aligned} & \bar{p}_1=9 / 142=.0634 \\ & \bar{p}_2=5 / 268=.0187 \\ & \bar{p}=\frac{n_1 \bar{p}_1+n_2 \bar{p}_2}{n_1+n_2}=\frac{9+5}{142+268}=.0341 \\ & z=\frac{\bar{p}_1-\bar{p}_2}{\sqrt{\bar{p}(1-\bar{p})\left(\frac{1}{n_1}+\frac{1}{n_2}\right)}}=\frac{.0634-.0187}{\sqrt{.0341(1-.0341)\left(\frac{1}{142}+\frac{1}{268}\right)}}=2.37 \\ & p \text {-value }=2(1.0000-.9911)=.0178 \end{aligned} \]
\(p\)-value \(\leq .02\), reject \(H_0\). There is a significant difference in drug resistance between the two states. Alabama has the higher drug-resistance rate.
ASW 11.23
Solution
\[ \begin{aligned} & \frac{(19)(900)}{30.144} \leq \sigma^2 \leq \frac{(19)(900)}{10.117} \\ & 567 \leq \sigma^2 \leq 1,690 \end{aligned} \]
ASW 11.29
Solution
\[ \begin{gathered} s^2=\frac{\Sigma\left(x_i-\bar{x}\right)^2}{n-1}=\frac{101.56}{9-1}=12.69 \\ H_0: \sigma^2=10 \\ H_{\mathrm{a}}: \sigma^2 \neq 10 \\ \chi^2=\frac{(n-1) s^2}{\sigma^2}=\frac{(9-1)(12.69)}{10}=10.16 \end{gathered} \]
Degrees of freedom \(=n-1=8\). Using \(\chi^2\) table, area in tail is greater than .10 .
Two-tail \(p\)-value is greater than .20 . Exact \(p-\) value corresponding to \(\chi^2=10.16\) is .5080 . \(p\)-value \(>.10\); do not reject \(H_0\).
ASW 14.55
Solution
No. Regression or correlation analysis can never prove that two variables are causally related.
ASW 14.1
Solution
There appears to be a positive linear relationship between \(x\) and \(y\).
Many different straight lines can be drawn to provide a linear approximation of the relationship between \(x\) and \(y\); in part d we will determine the equation of a straight line that “best” represents the relationship according to the least squares criterion.
\[ \bar{x}=\frac{\Sigma x_j}{n}=\frac{15}{5}=3 \quad \bar{y}=\frac{\Sigma y_i}{n}=\frac{40}{5}=8 \]
\[ \Sigma\left(x_i-\bar{x}\right)\left(y_i-\bar{y}\right)=26 \quad \Sigma\left(x_i-\bar{x}\right)^2=10 \]
\[ b_1=\frac{\Sigma(x-\bar{x})\left(y_i-\bar{y}\right)}{\Sigma(x,-\bar{x})^z}=\frac{26}{10}=2.6 \]
\[ b_v=\bar{y}-b_1 \bar{i}=8-(2.6)(3)=0.2 \]
\[ \mathrm{y}=0.2+2.6 \mathrm{x} \]
ASW 14.47
Solution
\[ \hat{y}=29.4+1.55 x \]
\[ \mathrm{MSR}=\mathrm{SSR} / 1=691.72 \]
\[ \mathrm{MSE}=\operatorname{SSE} /(n-2)=310.28 / 5=62.0554 \]
\[ F=\mathrm{MSR} / \mathrm{MSE}=691.72 / 62.0554=11.15 \]
Using the \(F\) table ( 1 degree of freedom numerator and 5 denominator), the \(p\)-value is between .01 and .025 .
Using Excel, the \(p\)-value corresponding to \(F=11.15\) is .0206 . Because \(p\)-value \(\leq \alpha=.05\), we conclude that the two variables are related.
Scatterplot
The residual plot leads us to question the assumption of a linear relationship between x and y. Even though the relationship is significant at the .05 level of significance, it would be extremely dangerous to extrapolate beyond the range of the data.
In this problem, we will simulate a simple linear regression model with one independent variable and one dependent variable. We will then add outliers to the data and see how the OLS estimates change.
To do this, open a do-file and write:
Estimate the OLS regression of \(Y\) on \(X\). What are the estimated coefficients? Why?
Add an outlier to the data by making the first row of \(Y\) equal to 100:
Solution
The coefficients should be close to 1 and 2, respectively, because there’s no ommitted variable bias or endogeneity.