Problem Set 3

EC031-S26

Author

Aleksandr Michuda

Group Project Assignment 1

For your first assignment for the group project, choose a group of 3-4 people and write their names down here.

Group Member 1: _________________

Group Member 2: _________________

Group Member 3: _________________

Group Member 4: _________________

  1. Choose a paper that you are all interested in
  2. Check if the data and code are available. If not, can you find a similar dataset that you can use to replicate the paper? (email me if you need help with this)
  3. If the replication code, check whether it only uses Stata.
  4. If you want to discuss the paper you chose, schedule time with me on my google calendar

Note: The problems may differ based on the edition of the textbook you have.

Problem 1

ASW 8.48

Problem 2

ASW 8.49

Problem 3

ASW 9.1

Problem 4

ASW 9.5

Problem 5

ASW 9.54

Problem 6

ASW 5.66

Problem 7

ASW 5.69

Problem 8

ASW 5.73

Applications with Stata

Harmonized LSMS Dataset

For this Stata assignment we will use the harmonized LSMS dataset that we used from the previous problem set. In this case, though, we will merge two of those datasets together to conduct our analysis. Suppose we want to know how household diversity is related to experiencing a flood event.

Create a .do file in which you do the following steps. Submit your do-file with a picture of the bar graph you create below. Note that you should have at least one line of code for each of the parts.

  1. Open Household_dataset.dta and merge in Plot_dataset.dta using the merge command. Type help merge if needed. Keep in mind that there are multiple observations per household in the Plot_dataset.dta as a household will have multiple plots and this dataset has multiple countries, waves and seasons. This means you should use a 1:m merge on country wave season hh_id_obs. How many observations were matched in the merge?

  2. Drop observations that were not matched. Also drop observations where flood_shock is missing.

drop if _merge != 3
  1. What is the mean of the flood_shock variable? What does the mean tell you about the frequency of flood events in the sample?

  2. Use the summarize command to derive the sample mean and standard deviation of HDDS for households that experience a flood shock. Then calculate the two-sided 95% confidence interval by hand. Show your work.

  3. Generate a 95% two-sided confidence interval for what you calculated in b. (confirm that it matches your calculated CI above).

  4. Now generate a 90% two-sided confidence interval of HDDS for households that experiences a flood shock. Is it narrower or wider than the 95% confidence interval?

  5. Now type bys flood_shock: summarize HDDS yield_kg What do you observe?

  6. Generate a 1-0 variable that indicates whether distance to the neares population center is more than the mean distance. Call it something intuitive like far_from_center. Generate another variable that is 1 if HDDS higher than 6 and 0 otherwise. Run summarize on these two variables. What is the interpretation of the means?

  7. Compute 95% confidence intervals for your 1-0 variable for \(HDDS>6\) for those that live close to the city vs. far from the city. (Note when using the ci command that now you are dealing with a sample proportion, not a sample mean, so you need to use ci proportions.)

  8. What do your confidence intervals suggest about whether households with higher diet diversity live farther from the center of a city? Would you reject or fail to reject a hypothesis that frequency of households with high diet diversity is the same close to and far away from the city?

Back to top