Problem Set 3

EC031-S26

Author

Aleksandr Michuda

Group Project Assignment 1

For your first assignment for the group project, choose a group of 3-4 people and write their names down here.

Group Member 1: _________________

Group Member 2: _________________

Group Member 3: _________________

Group Member 4: _________________

Choose a paper that you are all interested in
Check if the data and code are available. If not, can you find a similar dataset that you can use to replicate the paper? (email me if you need help with this)
If the replication code, check whether it only uses Stata.
If you want to discuss the paper you chose, schedule time with me on my google calendar

Note: The problems may differ based on the edition of the textbook you have.

Problem 1

ASW 8.48

Solution

Margin of error: $z_{025} \frac{\sigma}{\sqrt{n}}=1.96 \frac{15}{\sqrt{54}}=4.00$
Confidence interval: $\bar{x} \pm$ margin of error

\[ 33.77 \pm 4.00 \text { or } \$ 29.77 \text { to } \$ 37.77 \]

Problem 2

ASW 8.49

Solution

$\bar{x} \pm t .025(s / \sqrt{n})$ $d f=63 t .025=1.998$ $252.45 \pm 1.998(74.50 / \sqrt{64})$ $252.45 \pm 18.61$ or $\$ 233.84$ to $\$ 271.06$
Yes. The lower limit for the population mean at Niagara Falls is $\$ 233.84$, which is greater than $\$ 215.60$.

Problem 3

ASW 9.1

Solution

$H_0: \mu \leq 600 \quad$ Manager’s claim

\[ H_3: \mu>600 \]

We are not able to conclude that the manager’s claim is wrong.
The manager’s claim can be rejected. We can conclude that $\mu>600$.

Problem 4

ASW 9.5

Solution

Conclude that the population mean annual consumption of beer and cider in Milwaukee is greater than 26.9 gallons and hence higher than throughout the United States.
The type I error is rejecting $H_0$ when it is true. This error occurs if the researcher concludes that the population mean annual consumption of beer and cider in Milwaukee is greater than 26.9 gallons when the population mean annual consumption of beer and cider in Milwaukee is actually less than or equal to 26.9 gallons.
The type II error is accepting $H_0$ when it is false. This error occurs if the researcher concludes that the population mean annual consumption of beer and cider in Milwaukee is less than or equal to 26.9 gallons when it is not.

Problem 5

ASW 9.54

Solution

\[ n=\frac{\left(z_\alpha+z_p\right)^2 \sigma^2}{\left(\mu_0-\mu_z\right)^2}=\frac{(1.645+1.28)^2(5)^2}{(10-9)^2}=214 \]

Problem 6

ASW 5.66

Solution

Because the shipment is large, we can assume that the probabilities do not change from trial to trial and use the binomial probability distribution.

$n=5$

\[ f(0)=\binom{5}{0}(0.01)^0(0.99)^3=.9510 \]

$f(1)=\binom{5}{1}(0.01)^1(0.99)^4=.0480$
$1-f(0)=1-.9510=.0490$
No, the probability of finding one or more items in the sample defective when only $1 \%$ of the items in the population are defective is small (only .0490 ). I would consider it likely that more than $1 \%$ of the items are defective.

Problem 7

ASW 5.69

Solution

$\mu = 15$
$P(20 \text{ or more arrivals}) = f(20) + f(21) + \cdots = 0.0418 + 0.0299 + 0.0204 + 0.0133 + 0.0083 + 0.0050 + 0.0029 + 0.0016 + 0.0009 + 0.0004 + 0.0002 + 0.0001 + 0.0001 = 0.1249$

Problem 8

ASW 5.73

Solution

Hypergeometric $N=52, n=5$ and $r=4$.

\[ \frac{\binom{4}{2}\binom{48}{3}}{\binom{52}{5}}=\frac{6(17296)}{2,598,960} \]

\[ =.0399 \]

\[ \frac{\binom{4}{1}\binom{48}{4}}{\binom{52}{5}}=\frac{4(194580)}{2,598,960} \]

\[ =.2995 \]

\[ \begin{aligned} & \frac{\binom{4}{0}\binom{48}{5}}{\binom{52}{5}}=\frac{1,712,304}{2,598,960} \\ & =.6588 \end{aligned} \]

$1-\underline{f(0)}=1-.6588=.3412$

Applications with Stata

Harmonized LSMS Dataset

For this Stata assignment we will use the harmonized LSMS dataset that we used from the previous problem set. In this case, though, we will merge two of those datasets together to conduct our analysis. Suppose we want to know how household diversity is related to experiencing a flood event.

Create a .do file in which you do the following steps. Submit your do-file with a picture of the bar graph you create below. Note that you should have at least one line of code for each of the parts.

Open Household_dataset.dta and merge in Plot_dataset.dta using the merge command. Type help merge if needed. Keep in mind that there are multiple observations per household in the Plot_dataset.dta as a household will have multiple plots and this dataset has multiple countries, waves and seasons. This means you should use a 1:m merge on country wave season hh_id_obs. How many observations were matched in the merge?
Drop observations that were not matched. Also drop observations where flood_shock is missing.

drop if _merge != 3

What is the mean of the flood_shock variable? What does the mean tell you about the frequency of flood events in the sample?
Use the summarize command to derive the sample mean and standard deviation of HDDS for households that experience a flood shock. Then calculate the two-sided 95% confidence interval by hand. Show your work.
Generate a 95% two-sided confidence interval for what you calculated in b. (confirm that it matches your calculated CI above).
Now generate a 90% two-sided confidence interval of HDDS for households that experiences a flood shock. Is it narrower or wider than the 95% confidence interval?
Now type bys flood_shock: summarize HDDS yield_kg What do you observe?
Generate a 1-0 variable that indicates whether distance to the neares population center is more than the mean distance. Call it something intuitive like far_from_center. Generate another variable that is 1 if HDDS higher than 6 and 0 otherwise. Run summarize on these two variables. What is the interpretation of the means?
Compute 95% confidence intervals for your 1-0 variable for $HDDS>6$ for those that live close to the city vs. far from the city. (Note when using the ci command that now you are dealing with a sample proportion, not a sample mean, so you need to use ci proportions.)
What do your confidence intervals suggest about whether households with higher diet diversity live farther from the center of a city? Would you reject or fail to reject a hypothesis that frequency of households with high diet diversity is the same close to and far away from the city?

Solution

merge 1:m country wave season hh_id_obs using "<path to Plot_dataset.dta>"

sum flood_shock

Only about 2% of the households experienced a flood shock, which is a relatively rare event.

sum HDDS if flood_shock == 1

The mean is 7.973296 and the standard deviation is 2.16. The 95% confidence interval is $7.97 \pm 1.96 \frac{2.16}{\sqrt{3187}} = 7.97 \pm 0.076$ or $7.89$ to $8.05$.

ci means HDDS if flood_shock == 1

ci means HDDS if flood_shock == 1, level(90)

Households that experienced flood events seem to have lower yields (as expected), but higher diet diversity, which is peculiar… However, there may be different factors that could be driving a result like this. For instance, households that are more resilient to floods may have higher diet diversity and those that are more prepared might be able to cope better. Or perhaps households usually depend on their own crops for own-consumption; when a flood wipes that out, they have to rely on other sources of food, which could be more diverse.

egen mean_distance = mean(distance)
gen far_from_center = distance > mean_distance
gen high_hdds = HDDS > 6

summarize far_from_center high_hdds

ci proportions high_hdds if far_from_center == 0
ci proportions high_hdds if far_from_center == 1

The confidence intervals suggest that there are more households with high diet diversity that live close to the city than live far from it. Since the confidence don’t interlap, we can reject the hypothesis that they’re equal.

Other Formats

Group Project Assignment 1

Problem 1

Problem 2

Problem 3

Problem 4

Problem 5

Problem 6

Problem 7

Problem 8

Applications with Stata

Harmonized LSMS Dataset