Problem Set 3
EC031-S26
Group Project Assignment 1
For your first assignment for the group project, choose a group of 3-4 people and write their names down here.
Group Member 1: _________________
Group Member 2: _________________
Group Member 3: _________________
Group Member 4: _________________
- Choose a paper that you are all interested in
- Check if the data and code are available. If not, can you find a similar dataset that you can use to replicate the paper? (email me if you need help with this)
- If the replication code, check whether it only uses Stata.
- If you want to discuss the paper you chose, schedule time with me on my google calendar
Note: The problems may differ based on the edition of the textbook you have.
Problem 1
ASW 8.48
- Margin of error: \(z_{025} \frac{\sigma}{\sqrt{n}}=1.96 \frac{15}{\sqrt{54}}=4.00\)
- Confidence interval: \(\bar{x} \pm\) margin of error
\[ 33.77 \pm 4.00 \text { or } \$ 29.77 \text { to } \$ 37.77 \]
Problem 2
ASW 8.49
- \(\bar{x} \pm t .025(s / \sqrt{n})\) \(d f=63 t .025=1.998\) \(252.45 \pm 1.998(74.50 / \sqrt{64})\) \(252.45 \pm 18.61\) or \(\$ 233.84\) to \(\$ 271.06\)
- Yes. The lower limit for the population mean at Niagara Falls is \(\$ 233.84\), which is greater than \(\$ 215.60\).
Problem 3
ASW 9.1
- \(H_0: \mu \leq 600 \quad\) Manager’s claim
\[ H_3: \mu>600 \]
- We are not able to conclude that the manager’s claim is wrong.
- The manager’s claim can be rejected. We can conclude that \(\mu>600\).
Problem 4
ASW 9.5
- Conclude that the population mean annual consumption of beer and cider in Milwaukee is greater than 26.9 gallons and hence higher than throughout the United States.
- The type I error is rejecting \(H_0\) when it is true. This error occurs if the researcher concludes that the population mean annual consumption of beer and cider in Milwaukee is greater than 26.9 gallons when the population mean annual consumption of beer and cider in Milwaukee is actually less than or equal to 26.9 gallons.
- The type II error is accepting \(H_0\) when it is false. This error occurs if the researcher concludes that the population mean annual consumption of beer and cider in Milwaukee is less than or equal to 26.9 gallons when it is not.
Problem 5
ASW 9.54
\[ n=\frac{\left(z_\alpha+z_p\right)^2 \sigma^2}{\left(\mu_0-\mu_z\right)^2}=\frac{(1.645+1.28)^2(5)^2}{(10-9)^2}=214 \]
Problem 6
ASW 5.66
- Because the shipment is large, we can assume that the probabilities do not change from trial to trial and use the binomial probability distribution.
- \(n=5\)
\[ f(0)=\binom{5}{0}(0.01)^0(0.99)^3=.9510 \]
- \(f(1)=\binom{5}{1}(0.01)^1(0.99)^4=.0480\)
- \(1-f(0)=1-.9510=.0490\)
- No, the probability of finding one or more items in the sample defective when only \(1 \%\) of the items in the population are defective is small (only .0490 ). I would consider it likely that more than \(1 \%\) of the items are defective.
Problem 7
ASW 5.69
\(\mu = 15\)
\(P(20 \text{ or more arrivals}) = f(20) + f(21) + \cdots = 0.0418 + 0.0299 + 0.0204 + 0.0133 + 0.0083 + 0.0050 + 0.0029 + 0.0016 + 0.0009 + 0.0004 + 0.0002 + 0.0001 + 0.0001 = 0.1249\)
Problem 8
ASW 5.73
- Hypergeometric \(N=52, n=5\) and \(r=4\).
\[ \frac{\binom{4}{2}\binom{48}{3}}{\binom{52}{5}}=\frac{6(17296)}{2,598,960} \]
\[ =.0399 \]
\[ \frac{\binom{4}{1}\binom{48}{4}}{\binom{52}{5}}=\frac{4(194580)}{2,598,960} \]
\[ =.2995 \]
\[ \begin{aligned} & \frac{\binom{4}{0}\binom{48}{5}}{\binom{52}{5}}=\frac{1,712,304}{2,598,960} \\ & =.6588 \end{aligned} \]
- \(1-\underline{f(0)}=1-.6588=.3412\)
Applications with Stata
Harmonized LSMS Dataset
For this Stata assignment we will use the harmonized LSMS dataset that we used from the previous problem set. In this case, though, we will merge two of those datasets together to conduct our analysis. Suppose we want to know how household diversity is related to experiencing a flood event.
Create a .do file in which you do the following steps. Submit your do-file with a picture of the bar graph you create below. Note that you should have at least one line of code for each of the parts.
Open
Household_dataset.dtaand merge inPlot_dataset.dtausing themergecommand. Typehelp mergeif needed. Keep in mind that there are multiple observations per household in thePlot_dataset.dtaas a household will have multiple plots and this dataset has multiple countries, waves and seasons. This means you should use a1:mmerge oncountry wave season hh_id_obs. How many observations were matched in the merge?Drop observations that were not matched. Also drop observations where
flood_shockis missing.
drop if _merge != 3What is the mean of the
flood_shockvariable? What does the mean tell you about the frequency of flood events in the sample?Use the
summarizecommand to derive the sample mean and standard deviation of HDDS for households that experience a flood shock. Then calculate the two-sided 95% confidence interval by hand. Show your work.Generate a 95% two-sided confidence interval for what you calculated in b. (confirm that it matches your calculated CI above).
Now generate a 90% two-sided confidence interval of HDDS for households that experiences a flood shock. Is it narrower or wider than the 95% confidence interval?
Now type
bys flood_shock: summarize HDDS yield_kgWhat do you observe?Generate a 1-0 variable that indicates whether distance to the neares population center is more than the mean distance. Call it something intuitive like
far_from_center. Generate another variable that is 1 if HDDS higher than 6 and 0 otherwise. Runsummarizeon these two variables. What is the interpretation of the means?Compute 95% confidence intervals for your 1-0 variable for \(HDDS>6\) for those that live close to the city vs. far from the city. (Note when using the
cicommand that now you are dealing with a sample proportion, not a sample mean, so you need to useci proportions.)What do your confidence intervals suggest about whether households with higher diet diversity live farther from the center of a city? Would you reject or fail to reject a hypothesis that frequency of households with high diet diversity is the same close to and far away from the city?
merge 1:m country wave season hh_id_obs using "<path to Plot_dataset.dta>"sum flood_shockOnly about 2% of the households experienced a flood shock, which is a relatively rare event.
sum HDDS if flood_shock == 1The mean is 7.973296 and the standard deviation is 2.16. The 95% confidence interval is \(7.97 \pm 1.96 \frac{2.16}{\sqrt{3187}} = 7.97 \pm 0.076\) or \(7.89\) to \(8.05\).
ci means HDDS if flood_shock == 1ci means HDDS if flood_shock == 1, level(90)Households that experienced flood events seem to have lower yields (as expected), but higher diet diversity, which is peculiar… However, there may be different factors that could be driving a result like this. For instance, households that are more resilient to floods may have higher diet diversity and those that are more prepared might be able to cope better. Or perhaps households usually depend on their own crops for own-consumption; when a flood wipes that out, they have to rely on other sources of food, which could be more diverse.
egen mean_distance = mean(distance)
gen far_from_center = distance > mean_distance
gen high_hdds = HDDS > 6
summarize far_from_center high_hddsci proportions high_hdds if far_from_center == 0
ci proportions high_hdds if far_from_center == 1- The confidence intervals suggest that there are more households with high diet diversity that live close to the city than live far from it. Since the confidence don’t interlap, we can reject the hypothesis that they’re equal.