Bayes' Theorem

Definition

Bayes' theorem is a mathematical formula that gives the conditional probability of an event based on prior knowledge of related events.

The standard form is:

$P(A|B)=\frac{P(B|A)\times P(A)}{P(B)}$

Where:

$P(A|B)$ = probability of event $A$ occurring given that $B$ has occurred
$P(B|A)$ = probability of event $B$ occurring given that $A$ has occurred
$P(A)$ = prior probability of event $A$
$P(B)$ = total probability of event $B$

This formula is used to reverse conditional probability. In many practical problems, we know the probability of evidence given a cause and want the probability of the cause given the evidence.

Main Content

1. First Concept: Conditional Probability

Conditional probability means the probability of one event happening when another event is already known to have happened.
It is written as:

$P(A|B)=\frac{P(A\cap B)}{P(B)}$

provided that $P(B) > 0$ .

Explanation

Conditional probability is the foundation of Bayes' theorem. It tells us how the probability of an event changes once we know that another event has occurred.

For example, suppose we draw a card from a standard deck. If we know the card is a face card, the probability that it is a king changes from the overall probability of kings in the deck. This is because the sample space is now restricted to face cards only.

Example

Let:

Event $A$ = "the card is a king"
Event $B$ = "the card is a face card"

Then:

$P(A)=\frac{4}{52}$
$P(B)=\frac{12}{52}$
$P(A\cap B)=\frac{4}{52}$

So,

$P(A|B)=\frac{P(A\cap B)}{P(B)}=\frac{4/52}{12/52}=\frac{4}{12}=\frac{1}{3}$

This means that if a card is known to be a face card, the probability that it is a king is $1/3$ .

Why it matters

Conditional probability is the bridge between raw probability and updated probability. Without it, Bayes' theorem would not work.

2. Second Concept: Prior, Likelihood, and Posterior

Prior probability

is the initial belief about an event before new evidence is observed.

Likelihood

is the probability of observing the evidence assuming the event is true.

Posterior probability

is the updated probability after observing the evidence.

Explanation

Bayes' theorem is often understood using these three ideas:

Prior

: what we believed before

Likelihood

: how likely the new evidence is under that belief

Posterior

: what we believe after seeing the evidence

Bayes' theorem combines these into a single update rule:

$\text{Posterior} \propto \text{Likelihood} \times \text{Prior}$

Then the result is normalized using the total probability of the evidence.

Example: Medical Test

Suppose a disease affects 1% of the population.

Prior probability of disease: $P(D)=0.01$

Suppose a test has:

95% sensitivity: $P(+|D)=0.95$
90% specificity: $P(-|\text{not }D)=0.90$

That means false positive rate is: $P(+|\text{not }D)=0.10$

Now if a person tests positive, we want the posterior probability that they actually have the disease:

$P(D|+)=\frac{P(+|D)P(D)}{P(+)}$

First compute $P(+)$ :

$P(+)=P(+|D)P(D)+P(+|\text{not }D)P(\text{not }D)$

$P(+)=0.95(0.01)+0.10(0.99)=0.0095+0.099=0.1085$

Now:

$P(D|+)=\frac{0.0095}{0.1085}\approx 0.0876$

So even after a positive test, the probability of having the disease is about 8.76%, not 95%.

Why this is important

This example shows that the posterior depends not only on the test accuracy but also on the prior probability. If a disease is rare, many positive results may still be false positives.

3. Third Concept: Total Probability and Reversal of Cause and Effect

Bayes' theorem uses the law of total probability to calculate the overall chance of evidence.
It helps reverse conditional reasoning, moving from "cause given evidence" to "evidence given cause" or vice versa.
This reversal is useful in classification, diagnosis, and inference problems.

Explanation

Often we know the probability of observing some data under different possible causes. Bayes' theorem lets us estimate which cause is most likely after seeing the data.

If events $A_1, A_2, A_3, \dots, A_n$ form a partition of the sample space, then:

$P(B)=\sum_{i=1}^{n}P(B|A_i)P(A_i)$

So Bayes' theorem can be written as:

$P(A_i|B)=\frac{P(B|A_i)P(A_i)}{\sum_{j=1}^{n}P(B|A_j)P(A_j)}$

This form is very important when there are multiple possible explanations for the same evidence.

Example: Faulty Machine

Suppose a factory has three machines producing items:

Machine 1 produces 50% of the items
Machine 2 produces 30% of the items
Machine 3 produces 20% of the items

Their defect rates are:

$P(D|M_1)=0.02$
$P(D|M_2)=0.03$
$P(D|M_3)=0.05$

If a defective item is found, what is the probability it came from Machine 3?

First calculate total defect probability:

$P(D)=0.02(0.50)+0.03(0.30)+0.05(0.20)$

$P(D)=0.01+0.009+0.01=0.029$

Now:

$P(M_3|D)=\frac{P(D|M_3)P(M_3)}{P(D)}$

$P(M_3|D)=\frac{0.05\times 0.20}{0.029}=\frac{0.01}{0.029}\approx 0.345$

So there is about a 34.5% chance the defective item came from Machine 3.

Visual flow

Given evidence $B$ , Bayes' theorem helps determine the most likely cause:

Possible causes -> Evidence observed -> Updated probabilities
A1, A2, A3     ->     B            ->   P(A1|B), P(A2|B), P(A3|B)

This is the central idea behind Bayesian inference.

Working / Process

1. Identify the event and the evidence

Decide what the unknown event is, such as having a disease, choosing a machine, or being in a particular category.
Identify the evidence or observation that has been found.

2. Find the prior probability and likelihood

Determine the prior probability of the event before seeing evidence.
Determine the probability of the evidence assuming the event is true.

3. Apply Bayes' theorem and compute the posterior

Use: $P(A|B)=\frac{P(B|A)P(A)}{P(B)}$
If needed, compute $P(B)$ using total probability.
Simplify the result to obtain the updated probability.

Example process

For a medical test:

Step 1: Event = disease present, evidence = positive test
Step 2: Use prevalence as prior and test accuracy as likelihood
Step 3: Calculate posterior probability of disease after a positive result

General computation structure

$\text{Posterior} = \frac{\text{Likelihood} \times \text{Prior}}{\text{Evidence}}$

Another simple illustration

Prior belief + New data -> Updated belief

This updating process can be repeated whenever more data becomes available.

Advantages / Applications

Medical diagnosis

Helps estimate the actual probability of a disease after test results, especially when diseases are rare.

Machine learning and AI

Used in Naive Bayes classifiers, spam filtering, text categorization, and predictive modeling.

Decision-making under uncertainty

Helps make better choices when outcomes are uncertain and information is incomplete.

Fault detection and quality control

Used to identify likely causes of defects or failures in manufacturing systems.

Risk assessment and finance

Helps evaluate probabilities of loss, default, fraud, and other uncertain events.

Legal and forensic analysis

Assists in updating the probability of hypotheses based on evidence.

Scientific inference

Used in research to revise hypotheses as new data is collected.

Weather forecasting

Helps improve predictions by updating beliefs based on observed atmospheric conditions.

Summary

Bayes' theorem is a rule for updating probability using new evidence.
It connects prior probability, likelihood, and posterior probability.
It is useful for reversing conditional probability in real-world problems.
Important terms to remember: prior, likelihood, posterior, conditional probability, total probability.