Problem Set 2

EC031-S26

Aleksandr Michuda

Important: Be neat or type. You can download this problem set as a PDF, Word document, or Markdown file.

Note: The problems may differ based on the edition of the textbook you have.

Problem 1

ASW 4.22

Solution

\(P(\mathrm{~A})=.40, P(\mathrm{~B})=.40, P(\mathrm{C})=.60\)
\(\mathrm{A}^c=\left\{\mathrm{E}_3, \mathrm{E}_4, \mathrm{E}_5\right\} \mathrm{C}^{\mathrm{c}}=\left\{\mathrm{E}_1, \mathrm{E}_4\right\} P\left(\mathrm{~A}^c\right)=.60 P\left(\mathrm{C}^c\right)=.40\)
\(\mathrm{A} \cup \mathrm{B}^{\mathrm{c}}=\left\{\mathrm{E}_1, \mathrm{E}_2, \mathrm{E}_{\}}\right\} \underline{P\left(\mathrm{~A} \cup \mathrm{~B}^{-}\right)=.60}\)
\(\quad P(\mathrm{~B} \cup \mathrm{C})=P\left(\mathrm{E}_2, \mathrm{E}_3, \mathrm{E}_4, \mathrm{E}_5\right)=.80\)

Problem 2

ASW 4.41

Solution

\(S_1=\) successful, \(S_2=\) not successful, and \(B=\) request received for additional information. a. \(P\left(\mathrm{~S}_1\right)=.50\) b. \(\underline{P}\left(\mathrm{~B} \mid \mathrm{S}_1\right)=.75\) c. \(P(\mathrm{~S}, \mid \mathrm{B})=\frac{(.50)(.75)}{(.50)(.75)+(.50)(.40)}=\frac{.375}{.575}=.65\)

Note

Problems 3-5 have now been added to PS3.

Problem 6

ASW 5.21

Solution

\(E(x)=\sum_{0 x} f(x)=0.05(1)+0.09(2)+0.03(3)+0.42(4)+0.41(5)=4.05\)
\(E(x)=\Sigma x f(x)=0.04(1)+0.10(2)+0.12(3)+0.46(4)+0.28(5)=3.84\)
Executives: \(\sigma^2=\underline{\Sigma}(x-\mu)^2 f(x)=1.25\)

Middle Managers: \(\sigma^2=\Sigma(x-\mu)^2 f(x)=1.13\) d. Executives: \(\sigma=1.12\)

Middle Managers: \(\sigma=1.07\) e. The senior executives have a higher average score: 4.05 versus 3.84 for the middle managers. The executives also have a slightly higher standard deviation.

Problem 7

ASW 4.48

Solution

0.5029
0.5758
No, from part a we have \(P(F)=0.5029\), and from part b we have \(P(A \mid F)=0.5758\).

Because \(P(F) \neq \mathrm{P}(A \mid F)\), events \(A\) and \(F\) are not independent.

Problem 8

ASW 3.62

Problem 9

ASW 3.66

Problem 10

ASW 4.55

Problem 11

ASW 3.39

Problem 12

ASW 4.48

STATA Exercise: Cleaning data and data wrangling

Important

Make sure to submit the do-file along with the rest of your problem set.

Many datasets that you will encounter will be publicly available. For this question you will download a “harmonized” dataset from the World Bank Living Standards and Measurement Surveys (LSMS). This dataset provides a standardized survey across many less-developed economies. “Harmonization” means that the data have been cleaned and standardized across countries, as in many cases, variable names or coding schemes may differ across countries.

Go to the GitHub repository for the harmonized dataset and download the data file LSMS_ISA_harmonised_dataset.dta. Save this file in your Econ 31 folder. Unzip the file into another folder. This new folder should contain four files with .dta extensions:

Individual_dataset.dta
Plotcrop_dataset.dta
Plot_dataset.dta
Household_dataset.dta

These four files have are connected by different id variables, whether it be a unique individual ID, a household ID, country ID or some other characteristic.

Open Stata and open up the Do-file editor, which should open a new window. At the top of this window, write these commands:

clear all
set more off
use "<path to EC031 folder>/Household_dataset.dta"

Save it as ps2.do. Run the do-file. The dataset with all its variables should now be loaded into Stata.

Now add some commands to the bottom of your .do file.
- First, type describe. How many observations are there? How many variables?
Take a look at the HDDS variable. HDDS is a measure of household diversity, which is an important component for measuring food security. For the following questions write your answers as comments in the do-file.
1. Take a look at this description of how the HDDS is constructed: https://inddex.nutrition.tufts.edu/data4diets/indicator/household-dietary-diversity-score-hdds . What does the HDDS variable measure?
2. What are some issues with the HDDS?
3. What is the mean of the HDDS variable? Is this the mean for all households?
4. What is the mean for urban households?
5. What is the mean for rural households?
Construct a histogram of the HDDS variable, by urban and rural. Use the histogram command with the by option. What do you see?
Create a variable with the egen command that is equal to 1 if HDDS is below the mean and 0 otherwise. Call this variable low_hdds:

egen mean_hdds = mean(HDDS)
gen low_hdds = HDDS < mean_hdds

Use tabulate to give a table of the number of households with low HDDS by urban and rural. What is the probability of having a low HDDS and being in an urban area?
What is the probability of having a low HDDS, conditional on being in an urban area?

Solution

histogram HDDS, by(urban)
tab low_hdds urban

\[ 13,516/143,306=0.0944 \]

By the definition of conditional probability:

\[ P(\text{low HDDS} \mid \text{urban}) = \frac{P(\text{low HDDS and urban})}{P(\text{urban})} = \frac{\frac{13,516}{143,306}}{\frac{44,930}{143,306}} = \frac{13,516}{44,930} = 0.3008 \]