||Despite the diversity of
design, costs, and other factors, the aim of evaluation methods in use today is
essentially the same, i.e., to assess the effect of an intervention on one group compared
to the effect of a different intervention (or no intervention) on another group. By
definition, all evaluations have a control or comparison group. Exhibit 2 depicts a basic
framework for considering the methodological rigor of evaluation types, their respective
study elements, and examples of organ donation activity evaluations.
The evaluation types in Exhibit 2 are listed in rough order of most
to least scientifically rigorous for internal validity, (i.e., for accurately representing
the causal relationship between an intervention and an outcome in the particular
circumstances of a study). This ordering of methods assumes that each study is properly
designed and conducted; a poorly conducted large RCT may yield weaker findings than a well
conducted study that is lower on the design hierarchy. This list is representative; there
are other variations of these methodologic designs and some investigators use different
terminology for certain methods (Appendix A contains definitions of the evaluation types
listed in Exhibit 2).
As Exhibit 2 depicts, every evaluation has strengths and
weaknesses. There are typically trade-offs involving rigor, cost, and feasibility. The
importance of this trade-off depends largely on the activity under study and the goals and
resources of the organization conducting the evaluation. For example, it is possible to
design a prospective study to evaluate the effect of a national media campaign on actual
organ donation rates. However, the time and resources (e.g., money) needed to track
millions of people over time to capture what may be small differences in donation rates
might not be feasible. A prospective evaluation design may more appropriately be used to
assess post-event activities because fewer people (i.e., only those who become potential
donors) have to be tracked over a shorter period of time to capture a change in the
donation rate. For example, a prospective study might feasibly assess decoupling the
discussion of organ donation from the announcement of brain death on consent rates.
Given resource and time constraints, and the difficulties
associated with randomizing and perfectly controlling "real-world" studies, it
may not be possible, and is often not practical to conduct a randomized controlled study.
However, all evaluations can include elements that strengthen the methodology of the study
and produce more rigorous results. For example, a more valid comparison group often can
improve study designs. In a time-series study one group (e.g., a hospital) is measured at
baseline (e.g., consent rate, donation rate) and subjected to an intervention (e.g., an
in-service provider education program) and re-measured at several intervals to assess
changes in performance indicators. A more rigorous control would be an external control,
for example a hospital not receiving the in-service program. The control hospital is
subject to the same external influences as the study hospital (e.g., a concurrent mass
media campaign) and thus the effects of the intervention can be measured more accurately.
Examples from the behavioral literature provide insight on how to select and randomize
control groups for "health behavior interventions" similar to changing personal
behavior with regard to organ donation (Appendix C).
Exhibit 2: General Strengths and
Weaknesses of Evaluation Types
Source: Lewin, 1998
The following general guidelines are helpful
for weighing the relative rigor of alternative types of controlled studies.
Randomized studies require the assignment of
subjects to intervention and control groups based on a chance distribution. This technique
is used to diminish subject selection bias in controlled studies.
In a prospective study, the investigators conduct an
investigation on a group of subjects and analyze the outcomes. In a retrospective study,
investigators select groups of subjects who have already been subject to an intervention
and analyze how the intervention relates to the outcomes.
- Large studies are stronger than small studies.
The sample size of a study should be large enough to have
an acceptable probability of detecting a difference in outcomes, if such a difference
truly exists, between the experimental and control groups attributable to the intervention
being evaluated. Although larger studies increase the statistical power of the evaluation,
there is a point beyond which there are diminishing returns and studies may become
unnecessarily costly and inefficient.
- Contemporaneous controls are stronger than historical
A contemporaneous control group exists when the results of
an intervention group and a control group are compared over the same time period. An
historical control group exists when the results of an intervention group are compared
with the results of a control group observed at some previous time.
- External controls (multiple-group designs) are stronger
than self-controls (one-group designs).
A multiple-group design exists when comparisons are made
between one group receiving the intervention and one group not receiving the intervention
(control). A one-group design exists when the experience of a single group is compared
before (control) and after an intervention.