Section 3: Evaluation Design

Step 4: Determine sampling strategies to maximize external validity in the selected population or subpopulations for each evaluation question

To complement your evaluation design, you and your evaluation partners should determine your sampling plan. A sampling plan can help maximize the applicability of your evaluation findings to similar populations and subpopulations. The applicability of evaluation findings is commonly referred to as external validity and it can apply to people, places, and time periods.

A. Identify your sampling methods

In essence, your evaluation questions guide the selection of a sample of participants or data that represents your population or subpopulations of interest to ensure that the evaluation findings can be generalized to similar populations or subpopulations.

There are two main types of sampling methods:

  1. Probability, or random assignment of population or subpopulation members to intervention and control or comparison groups
  2. Non-probability, or non-random assignment
Sampling flow chart

This graphic11 illustrates how the different types of probability and non-probability sampling methods relate to one another and the descriptions as well as relative strengths and weaknesses are described in the following table.

Tools Icon

Tools and Resources

Sampling Methods: Comparing Strengths and Weaknesses12,13
This table shows descriptions of sampling methods along with strengths, weaknesses, and examples.

In creating your sampling plan, your partners should prioritize the sampling methods that best match your evaluation questions, design, and audiences as well as your partner's’ skills and experience, timeline, and resources available to recruit participants and collect data.

B. Determining your sample size

Next, you and your evaluation partners should decide on the size of the sample or samples necessary to make causal inferences about your population or subpopulations of interest based on the evaluation findings from your sample.

The goal is to increase statistical power in order to increase confidence that the evaluation findings are detecting an intervention effect when the intervention effect truly exists.

Tools Icon

Tools and Resources

Free and open source online calculators.


provides convenient excel-based functions to determine minimum detectable effect size and minimum required sample size for various experimental and quasi-experimental designs.

is R package version of PowerUp! and additionally includes functions to determine sample size for various multilevel randomized experiments with or without budgetary constraints.

package pwr.

Russ Lenth's power and sample-size page.

Free online statistical power analysis.

App for Android and iOS iPhone and iPad.

These power calculations should account for precision of the intervention effect estimates, systematic errors in the data collected or analyzed, and loss of participants in follow-up assessment over time (attrition).

Seek out referrals for statistical expertise from government agencies or academic institutions when more rigorous sampling and power techniques are needed.

Next: Continue to Section 4
two children cross sidewalk pedestrians