The Multinomial Distribution
Definition
If an experiment is repeated times independently, and each trial results in exactly one of mutually exclusive categories with probabilities
then the random vector
has a multinomial distribution if denotes the number of times category occurs in the trials.
The probability mass function is:
subject to
for all .
This distribution describes the probabilities of all possible ways the trials can be distributed among the categories.
Main Content
1. First Concept: Multinomial Experiment and Trial Outcomes
- Each trial has a fixed number of possible outcomes, and exactly one outcome occurs in every trial.
- The outcomes must be mutually exclusive and collectively exhaustive, meaning no two outcomes can happen at the same time and one of them must happen each time.
In a multinomial setting, the experiment is usually characterized by:
- A fixed number of trials
- Independent repetitions of the same experiment
- A set of categories or outcomes
- Constant probabilities for each outcome across all trials
For example, when rolling a fair die 10 times, each roll has 6 possible outcomes: . If we want to know the probability of getting exactly two 1s, one 2, three 3s, and four 4s, with zero 5s and zero 6s, we are dealing with a multinomial model.
A simple way to visualize the structure is:
Trial 1 -> one of k categories
Trial 2 -> one of k categories
Trial 3 -> one of k categories
...
Trial n -> one of k categories
The important idea is that the multinomial distribution does not track the exact order of outcomes; it tracks only the total counts in each category.
Example:
- Suppose a survey asks 100 people to choose one favorite fruit from apples, bananas, or oranges.
- If 40 choose apples, 35 choose bananas, and 25 choose oranges, then the counts are a multinomial outcome.
2. Second Concept: Probability Mass Function and Counting Arrangements
- The multinomial probability formula combines two parts: the number of possible arrangements and the probability of one particular arrangement.
- The term counts how many different sequences produce the same category counts.
Why this matters: If outcomes are observed only as counts, then many different sequences can lead to the same result. For example, with 3 trials and categories A and B, the count can happen in several orders: AAB, ABA, BAA. The multinomial coefficient counts all such arrangements.
For categories, the number of sequences with counts is:
This is called the multinomial coefficient.
The full probability formula is:
where means multiply the terms together.
Example: Suppose a die is fair, so each face has probability . If we roll it 3 times, what is the probability of getting:
- one 1
- one 2
- one 3
and zero for the other faces?
Then:
- , and the remaining counts are 0
- for all faces
So:
This works because there are 6 sequences of length 3 that contain one 1, one 2, and one 3 in any order.
3. Third Concept: Properties, Mean, and Variance
-
Each category count is a random variable, and the sum of all category counts equals the total number of trials:
-
The expected value and variability of each count can be determined from the probabilities.
For a multinomial distribution:
This means the average number of times category appears is the total number of trials multiplied by the probability of that category.
The variance of each count is:
The covariance between two different categories and is:
This negative covariance makes sense because if one category count increases, the others must decrease to keep the total fixed at .
Important consequences:
- The categories are not independent, even though the individual trials are independent.
- The counts are linked by the total sum constraint.
- The multinomial distribution generalizes the binomial distribution:
- If , it becomes the binomial distribution.
- For two categories, one count determines the other.
Example: If a student answers 20 multiple-choice questions by random guessing and each question has 4 choices, then:
- The expected number of times each choice is selected is:
So, on average, each option should be chosen about 5 times.
Working / Process
-
Identify the number of trials and categories
Determine the total number of repeated trials and list all possible outcome categories. Check that the categories are mutually exclusive and that every trial must produce exactly one category. -
Assign probabilities and observed counts
Write down the probability for each category and the observed count for each category. Make sure the counts satisfy: and the probabilities satisfy: -
Apply the multinomial formula
Use: First compute the multinomial coefficient, then multiply by the probability terms. This gives the probability of observing exactly those counts.
Example process:
- A bag contains red, blue, and green balls with probabilities .
- Three balls are drawn with replacement.
- Find the probability of counts .
Step 1: ,
Step 2: , probabilities are
Step 3:
Advantages / Applications
- Useful for modeling real-life situations with more than two outcomes, such as dice rolls, survey choices, machine classification results, and genetic categories.
- Provides a clear mathematical way to compute probabilities of count patterns across multiple categories.
- Forms the basis for many statistical methods in data science, quality control, and hypothesis testing, especially when dealing with categorical data.
Common applications include:
- Marketing surveys with multiple response options
- Medical diagnostics with several possible diagnoses
- Text classification and machine learning
- Genetic inheritance and allele frequencies
- Quality control with defect types
- Games of chance involving more than two outcome types
Because it handles multiple categories in one model, the multinomial distribution is especially valuable whenever analysts care about the distribution of counts rather than the exact order of events.
Summary
- The multinomial distribution models counts across several categories after a fixed number of independent trials.
- Its probability formula uses a multinomial coefficient and category probabilities.
- It extends the binomial distribution to more than two outcomes.
- Important terms to remember: trial, category, count, multinomial coefficient, probability mass function.