Sample Size Calculation for Prevalence Surveys
To calculate sample size for a descriptive prevalence survey, you need three key parameters: the expected prevalence rate (p), the desired margin of error or precision (e), and the significance level (typically 95% confidence). 1
Core Formula Requirements
For a descriptive cross-sectional prevalence survey, the calculation differs fundamentally from analytical studies because it does not depend on statistical power—power only applies when making statistical comparisons between groups. 1
Required Parameters
You must specify:
- Expected prevalence rate (p): An estimate of the condition's prevalence in your target population 1
- Desired precision (margin of error, e): How close you want your estimate to be to the true prevalence 1
- Significance level: Usually set at 95% confidence (α = 0.05) 1
The standard formula can be found in Eng's methodology references, though online calculators are available at http://riskcalc.org:3838/samplesize/ to simplify this process. 1
Critical Adjustments to Your Calculated Sample Size
Account for Non-Response
You must inflate your calculated sample size to account for expected non-response rates. 1
- If you expect a 70% response rate and need 500 participants, you must invite 500/0.70 = 714 subjects 1
- As a general rule, increase the sample by 5% for every confounder you plan to adjust for in analysis 1
Plan for Subgroup Analyses
If you intend to analyze males and females separately, or examine different subgroups, your sample size must be adequate for these subsample analyses. 1 This often requires substantially larger overall sample sizes than a simple prevalence estimate alone.
Practical Minimum Thresholds
Recent research suggests practical constraints:
- Sample sizes below 15 individuals typically yield unacceptable precision 2
- A practical minimum is to sample until you detect at least 5 cases and 5 non-cases, which works well except at extreme prevalence values (1% or 99%) 2
- For prevalence between 10-90%, minimum sample sizes of 16-45 may be acceptable, though with high uncertainty 2
- Optimal precision plateaus around 110-135 individuals for prevalence between 5-95%, making larger samples optional rather than essential 2
Common Pitfalls to Avoid
Choosing the Wrong Expected Prevalence
- If you have no local data, use international figures or data from similar populations 1
- Underestimating prevalence leads to underpowered studies; overestimating wastes resources 3
- Consider the acceptable precision carefully—tighter precision requires exponentially larger samples 1, 4
Ignoring Sampling Strategy Impact
Your sampling method affects both sample size and analysis complexity. 1
- Probability sampling methods (simple random, stratified, cluster) are preferred over convenience sampling for validity 1
- Cluster sampling requires larger sample sizes than simple random sampling for the same precision due to increased variance 1
- Stratified sampling requires weighted analysis since subjects in different strata have different inclusion probabilities 1
Two-Phase Sampling Considerations
If using questionnaire screening followed by clinical examination:
- This approach may underestimate prevalence if many cases are asymptomatic 1
- Consider including a random sample alongside symptom-based selection to avoid bias 1
Reporting Requirements
When publishing your study, you must report:
- How you arrived at your sample size calculation with all assumptions stated 1
- Flow diagrams showing participant numbers at each stage (invited, eligible, enrolled, analyzed) 1
- Response rates and comparison of responders versus non-responders by basic demographics 1
- The true response rate for longitudinal studies is participants at follow-up divided by those initially invited, not just the follow-up phase response 1
Alternative Precision-Based Approach
Rather than focusing solely on statistical power, consider planning sample size based on desired confidence interval width. 4 This precision-based approach may be more appropriate for prevalence estimation than traditional power calculations, which are designed for hypothesis testing rather than parameter estimation.