Does not hold if events are NOT mutually exclusive!
Why not?
How does the intersection factor in?
Other Useful Results based on the Postulates
\[
P(A') = 1 - P(A)
\tag{1}\]
\[
P(\emptyset) = 0
\tag{2}\]
\[
\text{If } A \subseteq B \rightarrow P(A) \leq P(B)
\tag{3}\]
Independence
A and B are independent if:
\[
P(A \cap B) = P(A)P(B)
\]
Is this the same as mutually exclusive?
NO, in fact mutually exclusive events are VERY dependent.
If event A occurs, you KNOW event B has not occurred.
Independence vs. Mutual Exclusion
Mutually Exclusive Events
You can’t be both sitting and standing at the same time
A traffic light can’t be both red and green simultaneously
You can’t be both indoors and outdoors at once
A restaurant can’t be both open and closed at the same moment
Independent Events
Whether it’s raining in New York doesn’t affect how many likes your latest post receives
The outcome of a coin flip doesn’t affect the outcome of the next flip
The number of cars passing by your house doesn’t affect the stock market
Conditioning on an Event
Often we have situations where our evaluation of the probability of one event (e.g., event A) is changed by the knowledge that some other event (B) has occurred.
This is called conditioning on an event.
We have knowledge that an event already occurred
Almost like injecting “time” into our probability calculations
Conditional Probability
Another way to say it is that a conditional probability is a way to “slice” an event by another event
We know that B has occured, so what is the probability of A occurring in that smaller sample space?
\[
P(A|B) = \frac{P(A \cap B)}{P(B)}
\]
Example
Suppose that the probability of majoring in economics is:
\[
P(\text{major}) = 0.21
\]
and the probability of identifying as a democrat is:
\[
P(\text{democrat}) = 0.5
\]
The probability of majoring in econ AND being a democrat is 0.06.
What is \(P(major | democrat)\)?
What is \(P(democrat | major )\)?
Conditioning on Multiple Events
We can also condition on multiple events
\[
P(A | B, C) = \frac{P(A \cap B \cap C)}{P(B \cap C)}
\]
With corresponding changes to Bayes Theorem and other formulas:
\[
P(A | B, C) = \frac{P(B | A, C)P(A | C)}{P(B | C)}
\]
Conditional Independence
Sometime the probability that one event has occurred gives us NO additional information about the likelihood of A occurring.
\[
P(A|B) = P(A)
\]
Does Independence Imply Conditional Independence?
Suppose A and B are independent
Are A and B independent, conditional on C?
\(P(A \cap B) = P(A)P(B) \rightarrow P(A|C)P(B|C)?\)
In the first case, if A and B are independent, they can still be dependent conditional on C.
In the second case, if A and B are conditionally independent, they can still be dependent unconditionally.
This is an important distinction because
Bayes Theorem
We will soon see the difference between “frequentist” and “Bayesian” statistics.
Most econometrics is based on “frequentist” statistics
The basis of Bayesian statistics is based on the idea of prior probabilities, updating of beliefs and forming posterior probabilities based on Bayes Theorem.
Bayes Theorem
Bayes Theorem is a different way to write conditional probability. It is a way to “flip” the conditioning of probabilities.
\[
P(A|B) = \frac{P(B|A)P(A)}{P(B)}
\]
Where does it come from?
We can derive it from the definition of conditional probability:
\[
P(A|B) = \frac{P(A \cap B)}{P(B)}
\]
But we can also write:
\[
P(B|A) = \frac{P(B \cap A)}{P(A)}
\]
Rearranging this gives:
\[
P(A \cap B) = P(B|A)P(A)
\]
Bayes Theorem
\[
P(A|B) = \frac{P(B|A)P(A)}{P(B)}
\]
So Bayes theorem tell us that we can use what we know about the new information (B) to update our beliefs about A.
Since we have knowledge about B, we can use that to get a better estimate of A.
Bayesian Thinking
Bayesian contend that Bayesian thinking and Bayes Theorem is a lot like how people actually think:
flowchart LR
A[I have belief about something] --> B(I am confronted with some information) --> C{I update my beliefs} --> A
And hopefully, I get closer to the “truth”
Bayes Rule/Theorem
Updating of beliefs is done with the formula:
\[
P(A|B) = \frac{P(B|A)P(A)}{P(B)}
\]
Where \(P(B)\) is then calculated using the law of total probability:
\[
P(B) = P(A)P(B|A) + (1-P(A))P(B|\neg A)
\]
or with more A events: \(P(B) = \sum_i^k P(A_i)P(B|A_i)\) is the Law of Total Probability
The Law of Total Probability
The Law of Total Probability is the idea that to get the probability of an event, you can sum up lots of conditional probabilities weighted by what you’re conditioning on.
Let’s say that we have some preliminary information about whether the senator of a state will win an election:
\[
P(win) = .7
\]
\[
P(lose) = 1 - P(win) = .3
\]
New Information
Some news comes in that suggests the Senator is a cat person. This takes over the news cycle for 48 hours as cat food manufacturers begin investing heavily in the campaign, while the Dog lobby begins frantically carrying out opposition research. How should we update our beliefs on the outcome of the election?
All we know is that based on previous data, after winning an election, candidates tended to be cat people with probability .2 and candidates tended to be cat people after losing an election with probability .9.
\[
P(\text{cat person} | win) = .2
\]
\[
P(\text{cat person} | lose) = .9
\]
Solving this
Our goal is then to find the new probability, \(P(win | cat person)\).
So based on this information, we have gone from from the probability of winning of .7 \(\rightarrow\) .34.
Don’t be a cat person if you have political ambitions!
Bayesian Networks
Graphical models that represent the probabilistic relationships among a set of variables.
Can model more complex systems where variables are interdependent and uncertainty is present.
Each node in the network represents a random variable, and the directed edges between nodes represent conditional dependencies.
Bayesian networks are widely used in various fields, including machine learning, artificial intelligence, and decision analysis, to perform tasks such as inference, prediction, and decision-making under uncertainty.
We will not go in-depth on this, but the basics of Bayesian networks are useful for understanding causal inference later in the course.
Example
A Bayesian network encodes information about the joint probability distribution of a set of random variables.
The information from the nodes (variables) and edges (dependencies) can be used to break down complex joint probability distributions into simpler conditional probabilities.
graph LR
T[Tax Cuts]
Y[GDP Growth]
S[Sentiment]
I[Investment]
T --> I
S --> I
I --> Y
We can see here that tax cuts affect investment in the economy, but it also might change sentiment about investment. Investment then feeds into GDP growth.
Bayesian Networks
The joint probability distribution for this network is:
\[
P(T, Y, S, I)
\]
This is the same as saying \(P(T \cap Y \cap S \cap I)\), which is the probability of all of these events happening together.
How do we break this up using the definition of conditional probability?
But we can break this down into conditional probabilities using the structure of the network:
\[
P(T, Y, S, I) = P(T)P(S)P(I|T,S)P(Y|I)
\]
Let’s break this down
If all these probabilities were independent of each other, we could write it as:
\[
P(T, Y, S, I) = P(T)P(S)P(I)P(Y)
\]
But the network here tells us that there are dependencies between the variables.
We know that the marginal probability of tax cuts, \(P(T)\) is independent of everything else.
So:
\[
P(T|Y,S,I) = P(T)
\]
Sentiment is also independent of everything else:
\[
P(S|T,Y,I) = P(S)
\]
Investment depends on tax cuts and sentiment, but is independent of GDP growth:
\[
P(I|T,Y,S) = P(I|T,S)
\]
Finally, GDP growth depends only on investment:
\[
P(Y|T,S,I) = P(Y|I)
\]
Example: Rain and Wet Grass
One day you wake up, and you look out the window at your lawn. You see that the grass is wet. You wonder whether it rained last night. You know that you live dry place, and the probability of rain is about 20%. When it rains, the probability that the grass gets wet is 90%. However, sometimes the morning dew makes the grass wet, even if it didn’t rain. The probability of dew making the grass wet is 20%. So now you ask, given that the grass is wet, what is the probability that it rained last night?
This leads to a pretty simple Bayesian network:
graph LR
R[Rain]
G[Wet Grass]
R --> G
This can be solved in the same way that we solved the previous example with the senator and cat people.
More Complicated Example
What if we also have a sprinkler?
Let’s say that the sprinkler is on a timer, so it doesn’t “know” if it has rained or not.
Let’s say that morning dew is not a factor anymore.
What does this imply about what the resulting would look like?
What would it look like if the sprinkler had a sensor to know if it had rained?
More Complicated Example
So let’s say you wake up and see that the grass was wet. It isn’t clear whether it was rain or the sprinkler that made the grass wet.
The probability of rain is still 20%.
The probability that the sprinkler is on is 10%.
We can also say that if it rains OR the sprinkler is on, the probability that the grass is wet is 100%.
What does this mean in terms of probabilities?
If even one of those events happens, the grass is wet with probability 1.
So \(P(W=1|R=1,S=0) = 1\), \(P(W=1|R=0,S=1) = 1\), and \(P(W=1|R=1,S=1) = 1\)
But if neither happens, the grass is dry with probability 1.
So \(P(W=1|R=0,S=0) = 0\)
So what’s the probability of it being wet (\(P(W=1)\))?