Problem Set 2
EC031-S26
Important: Be neat or type. You can download this problem set as a PDF, Word document, or Markdown file.
Note: The problems may differ based on the edition of the textbook you have.
Problem 1
ASW 4.22
- \(P(\mathrm{~A})=.40, P(\mathrm{~B})=.40, P(\mathrm{C})=.60\)
- \(\mathrm{A}^c=\left\{\mathrm{E}_3, \mathrm{E}_4, \mathrm{E}_5\right\} \mathrm{C}^{\mathrm{c}}=\left\{\mathrm{E}_1, \mathrm{E}_4\right\} P\left(\mathrm{~A}^c\right)=.60 P\left(\mathrm{C}^c\right)=.40\)
- \(\mathrm{A} \cup \mathrm{B}^{\mathrm{c}}=\left\{\mathrm{E}_1, \mathrm{E}_2, \mathrm{E}_{\}}\right\} \underline{P\left(\mathrm{~A} \cup \mathrm{~B}^{-}\right)=.60}\)
- \(\quad P(\mathrm{~B} \cup \mathrm{C})=P\left(\mathrm{E}_2, \mathrm{E}_3, \mathrm{E}_4, \mathrm{E}_5\right)=.80\)
Problem 2
ASW 4.41
\(S_1=\) successful, \(S_2=\) not successful, and \(B=\) request received for additional information. a. \(P\left(\mathrm{~S}_1\right)=.50\) b. \(\underline{P}\left(\mathrm{~B} \mid \mathrm{S}_1\right)=.75\) c. \(P(\mathrm{~S}, \mid \mathrm{B})=\frac{(.50)(.75)}{(.50)(.75)+(.50)(.40)}=\frac{.375}{.575}=.65\)
Problems 3-5 have now been added to PS3.
Problem 6
ASW 5.21
- \(E(x)=\sum_{0 x} f(x)=0.05(1)+0.09(2)+0.03(3)+0.42(4)+0.41(5)=4.05\)
- \(E(x)=\Sigma x f(x)=0.04(1)+0.10(2)+0.12(3)+0.46(4)+0.28(5)=3.84\)
- Executives: \(\sigma^2=\underline{\Sigma}(x-\mu)^2 f(x)=1.25\)
Middle Managers: \(\sigma^2=\Sigma(x-\mu)^2 f(x)=1.13\) d. Executives: \(\sigma=1.12\)
Middle Managers: \(\sigma=1.07\) e. The senior executives have a higher average score: 4.05 versus 3.84 for the middle managers. The executives also have a slightly higher standard deviation.
Problem 7
ASW 4.48
- 0.5029
- 0.5758
- No, from part a we have \(P(F)=0.5029\), and from part b we have \(P(A \mid F)=0.5758\).
Because \(P(F) \neq \mathrm{P}(A \mid F)\), events \(A\) and \(F\) are not independent.
Problem 8
ASW 3.62
Problem 9
ASW 3.66
Problem 10
ASW 4.55
Problem 11
ASW 3.39
Problem 12
ASW 4.48
STATA Exercise: Cleaning data and data wrangling
Make sure to submit the do-file along with the rest of your problem set.
Many datasets that you will encounter will be publicly available. For this question you will download a “harmonized” dataset from the World Bank Living Standards and Measurement Surveys (LSMS). This dataset provides a standardized survey across many less-developed economies. “Harmonization” means that the data have been cleaned and standardized across countries, as in many cases, variable names or coding schemes may differ across countries.
- Go to the GitHub repository for the harmonized dataset and download the data file
LSMS_ISA_harmonised_dataset.dta. Save this file in your Econ 31 folder. Unzip the file into another folder. This new folder should contain four files with.dtaextensions:
Individual_dataset.dta
Plotcrop_dataset.dta
Plot_dataset.dta
Household_dataset.dta
These four files have are connected by different id variables, whether it be a unique individual ID, a household ID, country ID or some other characteristic.
- Open Stata and open up the Do-file editor, which should open a new window. At the top of this window, write these commands:
clear all
set more off
use "<path to EC031 folder>/Household_dataset.dta"Save it as ps2.do. Run the do-file. The dataset with all its variables should now be loaded into Stata.
Now add some commands to the bottom of your
.dofile.- First, type
describe. How many observations are there? How many variables?
- First, type
Take a look at the HDDS variable. HDDS is a measure of household diversity, which is an important component for measuring food security. For the following questions write your answers as comments in the do-file.
- Take a look at this description of how the HDDS is constructed: https://inddex.nutrition.tufts.edu/data4diets/indicator/household-dietary-diversity-score-hdds . What does the HDDS variable measure?
- What are some issues with the HDDS?
- What is the mean of the HDDS variable? Is this the mean for all households?
- What is the mean for urban households?
- What is the mean for rural households?
Construct a histogram of the HDDS variable, by urban and rural. Use the
histogramcommand with thebyoption. What do you see?Create a variable with the
egencommand that is equal to 1 if HDDS is below the mean and 0 otherwise. Call this variablelow_hdds:
egen mean_hdds = mean(HDDS)
gen low_hdds = HDDS < mean_hddsUse
tabulateto give a table of the number of households with low HDDS by urban and rural. What is the probability of having a low HDDS and being in an urban area?What is the probability of having a low HDDS, conditional on being in an urban area?
- histogram HDDS, by(urban)
- tab low_hdds urban
\[ 13,516/143,306=0.0944 \]
- By the definition of conditional probability:
\[ P(\text{low HDDS} \mid \text{urban}) = \frac{P(\text{low HDDS and urban})}{P(\text{urban})} = \frac{\frac{13,516}{143,306}}{\frac{44,930}{143,306}} = \frac{13,516}{44,930} = 0.3008 \]