The Multinomial Distribution

Definition

If an experiment is repeated $n$ times independently, and each trial results in exactly one of $k$ mutually exclusive categories with probabilities

$p_1, p_2, \dots, p_k \quad \text{where} \quad p_1 + p_2 + \cdots + p_k = 1,$

then the random vector

$(X_1, X_2, \dots, X_k)$

has a multinomial distribution if $X_i$ denotes the number of times category $i$ occurs in the $n$ trials.

The probability mass function is:

$P(X_1=x_1, X_2=x_2, \dots, X_k=x_k) = \frac{n!}{x_1!x_2!\cdots x_k!} p_1^{x_1}p_2^{x_2}\cdots p_k^{x_k}$

subject to

$x_1+x_2+\cdots+x_k=n, \quad x_i \ge 0$

for all $i$ .

This distribution describes the probabilities of all possible ways the $n$ trials can be distributed among the $k$ categories.

Main Content

1. First Concept: Multinomial Experiment and Trial Outcomes

Each trial has a fixed number of possible outcomes, and exactly one outcome occurs in every trial.
The outcomes must be mutually exclusive and collectively exhaustive, meaning no two outcomes can happen at the same time and one of them must happen each time.

In a multinomial setting, the experiment is usually characterized by:

A fixed number of trials $n$
Independent repetitions of the same experiment
A set of categories or outcomes $1, 2, \dots, k$
Constant probabilities $p_1, p_2, \dots, p_k$ for each outcome across all trials

For example, when rolling a fair die 10 times, each roll has 6 possible outcomes: $1,2,3,4,5,6$ . If we want to know the probability of getting exactly two 1s, one 2, three 3s, and four 4s, with zero 5s and zero 6s, we are dealing with a multinomial model.

A simple way to visualize the structure is:

Trial 1  -> one of k categories
Trial 2  -> one of k categories
Trial 3  -> one of k categories
...
Trial n  -> one of k categories

The important idea is that the multinomial distribution does not track the exact order of outcomes; it tracks only the total counts in each category.

Example:

Suppose a survey asks 100 people to choose one favorite fruit from apples, bananas, or oranges.
If 40 choose apples, 35 choose bananas, and 25 choose oranges, then the counts $(40,35,25)$ are a multinomial outcome.

2. Second Concept: Probability Mass Function and Counting Arrangements

The multinomial probability formula combines two parts: the number of possible arrangements and the probability of one particular arrangement.
The term $\frac{n!}{x_1!x_2!\cdots x_k!}$ counts how many different sequences produce the same category counts.

Why this matters: If outcomes are observed only as counts, then many different sequences can lead to the same result. For example, with 3 trials and categories A and B, the count $(2,1)$ can happen in several orders: AAB, ABA, BAA. The multinomial coefficient counts all such arrangements.

For $k$ categories, the number of sequences with counts $x_1,x_2,\dots,x_k$ is:

$\frac{n!}{x_1!x_2!\cdots x_k!}$

This is called the multinomial coefficient.

The full probability formula is:

$P(X_1=x_1,\dots,X_k=x_k) = \frac{n!}{x_1!x_2!\cdots x_k!} \prod_{i=1}^{k} p_i^{x_i}$

where $\prod$ means multiply the terms together.

Example: Suppose a die is fair, so each face has probability $1/6$ . If we roll it 3 times, what is the probability of getting:

one 1
one 2
one 3

and zero for the other faces?

Then:

$n=3$
$x_1=x_2=x_3=1$ , and the remaining counts are 0
$p_i=1/6$ for all faces

So:

$P=\frac{3!}{1!1!1!}(1/6)^1(1/6)^1(1/6)^1 =6\cdot \frac{1}{216} =\frac{1}{36}$

This works because there are 6 sequences of length 3 that contain one 1, one 2, and one 3 in any order.

3. Third Concept: Properties, Mean, and Variance

Each category count $X_i$ is a random variable, and the sum of all category counts equals the total number of trials: $X_1+X_2+\cdots+X_k=n$
The expected value and variability of each count can be determined from the probabilities.

For a multinomial distribution:

$E[X_i]=np_i$

This means the average number of times category $i$ appears is the total number of trials multiplied by the probability of that category.

The variance of each count is:

$\mathrm{Var}(X_i)=np_i(1-p_i)$

The covariance between two different categories $i$ and $j$ is:

$\mathrm{Cov}(X_i,X_j)=-np_ip_j \quad \text{for } i\ne j$

This negative covariance makes sense because if one category count increases, the others must decrease to keep the total fixed at $n$ .

Important consequences:

The categories are not independent, even though the individual trials are independent.
The counts are linked by the total sum constraint.
The multinomial distribution generalizes the binomial distribution:
If $k=2$ , it becomes the binomial distribution.
For two categories, one count determines the other.

Example: If a student answers 20 multiple-choice questions by random guessing and each question has 4 choices, then:

$p_1=p_2=p_3=p_4=1/4$
The expected number of times each choice is selected is: $20 \times \frac14 = 5$

So, on average, each option should be chosen about 5 times.

Working / Process

Identify the number of trials and categories
Determine the total number of repeated trials $n$ and list all possible outcome categories. Check that the categories are mutually exclusive and that every trial must produce exactly one category.
Assign probabilities and observed counts
Write down the probability $p_i$ for each category and the observed count $x_i$ for each category. Make sure the counts satisfy: $x_1+x_2+\cdots+x_k=n$ and the probabilities satisfy: $p_1+p_2+\cdots+p_k=1$
Apply the multinomial formula
Use: $P(X_1=x_1,\dots,X_k=x_k)=\frac{n!}{x_1!x_2!\cdots x_k!}\prod_{i=1}^{k}p_i^{x_i}$ First compute the multinomial coefficient, then multiply by the probability terms. This gives the probability of observing exactly those counts.

Example process:

A bag contains red, blue, and green balls with probabilities $0.2, 0.5, 0.3$ .
Three balls are drawn with replacement.
Find the probability of counts $(1,1,1)$ .

Step 1: $n=3$ , $k=3$
Step 2: $x_1=x_2=x_3=1$ , probabilities are $0.2,0.5,0.3$
Step 3: $P=\frac{3!}{1!1!1!}(0.2)^1(0.5)^1(0.3)^1 =6(0.03)=0.18$

Advantages / Applications

Useful for modeling real-life situations with more than two outcomes, such as dice rolls, survey choices, machine classification results, and genetic categories.
Provides a clear mathematical way to compute probabilities of count patterns across multiple categories.
Forms the basis for many statistical methods in data science, quality control, and hypothesis testing, especially when dealing with categorical data.

Common applications include:

Marketing surveys with multiple response options
Medical diagnostics with several possible diagnoses
Text classification and machine learning
Genetic inheritance and allele frequencies
Quality control with defect types
Games of chance involving more than two outcome types

Because it handles multiple categories in one model, the multinomial distribution is especially valuable whenever analysts care about the distribution of counts rather than the exact order of events.

Summary

The multinomial distribution models counts across several categories after a fixed number of independent trials.
Its probability formula uses a multinomial coefficient and category probabilities.
It extends the binomial distribution to more than two outcomes.
Important terms to remember: trial, category, count, multinomial coefficient, probability mass function.