Statistics for Data Science – Test for One Proportion

When working with categorical data in data science, one common question is whether the proportion of a particular category in a sample matches a hypothesized value in the population. The test for one proportion is a statistical method used to answer that question. It’s especially useful in A/B testing, quality control, and survey analysis.

What Is a One-Proportion Z-Test?

The one-proportion z-test is used to determine whether the observed proportion in a sample significantly differs from a specified population proportion. This test falls under the category of hypothesis testing and assumes a normal approximation to the binomial distribution, which is valid when the sample size is large enough.

Example Use Case

Suppose a product manager claims that 60% of users who sign up for a free trial eventually convert to paying customers. You collect a random sample of 150 users and find that only 75 of them converted. You want to test whether the conversion rate is actually different from 60%.

Step-by-Step Approach

1. Formulate Hypotheses

We begin by setting up the null and alternative hypotheses:

Null Hypothesis (H₀): The true proportion is equal to the claimed value. H₀: p = p₀
Alternative Hypothesis (H₁): The true proportion is different (two-tailed), less than, or greater than the claimed value.
H₁: p ≠ p₀ (two-tailed)
H₁: p < p₀ (left-tailed)
H₁: p > p₀ (right-tailed)

2. Check Assumptions

To use the z-test, the sample size must be large enough for the normal approximation to hold. Specifically, both:

np₀ ≥ 10
n(1 – p₀) ≥ 10

3. Calculate the Test Statistic

The test statistic is calculated as:

z = (p̂ - p₀) / √(p₀(1 - p₀) / n)

Where:

p̂ = observed sample proportion = x/n
p₀ = hypothesized population proportion
n = sample size

4. Determine the p-value

Use the standard normal distribution (Z-distribution) to find the p-value corresponding to the z-score. The p-value helps us decide whether to reject the null hypothesis.

5. Make a Decision

If the p-value ≤ α (commonly 0.05), reject the null hypothesis.
If the p-value > α, fail to reject the null hypothesis.

Example Calculation

From our earlier example:

Hypothesized proportion p₀ = 0.60
Sample size n = 150
Observed conversions x = 75
Observed proportion p̂ = 75/150 = 0.50

Check assumptions:

np₀ = 150 × 0.60 = 90
n(1 – p₀) = 150 × 0.40 = 60

Both > 10, so normal approximation is valid.

Calculate z:

z = (0.50 - 0.60) / √(0.60 × 0.40 / 150) ≈ -2.58

A z-score of -2.58 corresponds to a p-value ≈ 0.01 for a two-tailed test. Since 0.01 < 0.05, we reject the null hypothesis. There is significant evidence that the true conversion rate is different from 60%.

Final Thoughts

The one-proportion z-test is a foundational tool in data science when evaluating categorical outcomes. Whether you’re testing user behavior, marketing effectiveness, or quality metrics, understanding how to apply this test properly is essential. Always be sure to check assumptions and interpret results in the context of your data and domain knowledge.