**Were subjects randomly assigned to the treatment group?**

Randomization is a pivotal step in determining the validity of a study.^{3} It ensures that each subject has the same probability of being selected for active treatment protocols rather than for a control treatment or placebo. It also allows study results to be generalized to a larger population of interest. Randomization can be as simple as “flipping a coin” or using a random number generator.

The strengths and weaknesses of each article must be evaluated independently. If clinical practice recommendations based on RCTs are not available, physicians may choose to look at the results of nonrandomized studies. Causation (“X causes Y”) cannot be established using observational studies (eg, case control study, cohort study), however. In most instances, only an association between a therapeutic intervention and desired outcome can be interpreted from the results of observational studies.^{4}

**Were all the subjects accounted for and attributed at the end of the study?**

All enrolled subjects must be accounted for at the end of the investigation. A large “lost to follow-up” group may lead researchers to present biased study results. For example, if very ill subjects do not complete a study and are not ultimately accounted for, the study's outcome may appear favorable when a more thorough analysis of the data gathered may have led researchers to very different conclusions.

All study participants should be analyzed in the groups to which they were originally assigned. This principle is called “intention-to-treat.”^{5} This research model allows group randomization to be preserved, and the known and unknown factors affecting patient prognosis have an equal probability of impacting subjects assigned to each study group. Researchers who conduct their analysis excluding very ill subjects would obtain results that suggest the therapy was efficacious—though those study results would be compromised through bias.

**Was the study “blinded”?**

Study participants, clinicians, and investigators should not know which subjects are assigned to the intervention or control group. Participants who are aware of their group assignments may bias study results by behaving and/or responding differently. Clinicians who are aware of subject assignments may treat individuals in the intervention group differently than those in the control group, unconsciously (or consciously) manipulating the study design or analysis.

*Double-blinding* refers to the processes of keeping group assignments concealed from study subjects and investigators. Sometimes, however, a double-blinded research protocol is simply not possible given the intervention used. For example, a study that investigates the efficacy of surgical vs nonsurgical procedures is unable to conceal group assignments from the subjects or surgeons. When possible, however, all measures of progress and improvement for such studies should be concealed from primary investigators through the use of independent evaluators.

**Were the study groups similar at the start of the investigation?**

The demographics and description of study participants are usually found in the first table of an article. There are always some established risk factors that may affect the study outcome. Therefore, it is important to determine whether these factors are equally balanced between the intervention and control groups. If one wanted to determine the overall benefit of a surgical procedure, it would be important to know if other comorbid conditions are balanced between the two groups (eg, coronary heart disease, diabetes mellitus). Randomization does not always guarantee an equal balance of demographic factors and medical history between groups. If the difference of a variable (eg, age) is large, it may bias study results. It is important for clinical investigations to have a sufficient number of subjects (sample size) so that the results would be able to find a desired difference in the outcome (sufficient power).^{6} Small studies have a greater probability of having an unequal distribution of baseline subject characteristics. At times, however, a deficiency of this kind may be overcome using appropriate statistical tools (eg, regression analyses).

**How big was the treatment effect?**

Once it has been determined that a particular article on therapy is valid, the physician should evaluate the magnitude of the treatment effect and its precision. Only basic mathematical and statistical skills are required for this kind of postpublication review and analysis of the medical literature.

Most journal articles report outcomes in a dichotomous fashion. For example, one may chose to evaluate whether or not daily use of aspirin prolongs life 2 years after an initial myocardial infarction. Therefore, a researcher might then compare an event of interest (eg, mortality) among those who received aspirin and those who received nothing (or placebo). The proportion of those who died in the placebo group determine what is called the

*control event rate* (CER). The CER is considered the baseline risk for patients who meet study inclusion and exclusion criteria. The proportion of study subjects in the intervention group who died determines the

*experimental event rate* (EER). The truncated table shown in

Figure 3 provides a hypothetical example of just one possible study outcome (ie, mortality ≤2 y postinfarction).

Although the difference as reported in the table between the EER and CER may appear to be statistically significant, the data presented does not provide us with any clinically useful information.

Numerical terms that can be applied to our patients and allow us to explain potential outcomes are needed. It is for this reason that many articles report the relative risk reduction (RRR). When the EER is subtracted from the CER and that total is then divided by the CER, the result is the RRR.

^{7} To elaborate using the example provided by the data in the table (

Figure 3), the RRR=[0.15-0.05]/0.15=0.67 or 67%. A physician reading this number in a medical journal can then safely say that aspirin, relative to placebo, decreased patients' risk of death in the 2 years after an initial myocardial infarction.

Although this finding appears to be impressive since it confers a large treatment effect, it still conveys incomplete information to the reader because it does not attempt to evaluate patients' baseline risk (ie, CER) of death during the 24 months postinfarction. One cannot discriminate large treatment effects from small ones. Therefore, for example, with a postinfarction CER of 0.00015% and an EER of 0.00005%, the RRR will still be 67%. Because the baseline risk (ie, CER) is small, a further decrease in risk will have only minimal clinical impact. It is for this reason that the RRR is not the best calculation to use in clinical practice.

The absolute risk reduction (ARR), which is calculated by subtracting the EER from the CER, takes the baseline risk into account.

^{7} Using the same example (

Figure 3), the ARR would be calculated by subtracting the 0.05 EER from the 0.15 CER for an ARR of 0.10 or 10%. This analysis would lead physicians practicing EBM to reach a very different conclusion from physicians who consider only the RRR of 67%.

At times, it is difficult to recall an ARR value. In addition, physicians want to convey the information available to their patients in a manner that is easy for them to understand. For these purposes, the number needed to treat (NNT) can be calculated. The NNT is computed by taking the reciprocal of the AAR, or dividing 1 by the ARR.

^{1} To calculate the NNT for the study reported in the table (

Figure 3), one would divide 1 by 0.10 for an NNT total of 10. In other words, 10 patients need to take 81 g of aspirin daily for 2 years postinfarction to prevent one mortality.

Physicians must then decide whether an NNT of 10 is clinically significant or remarkable. This determination can be made by comparing the number to other NNTs for interventions with a similar therapy duration. The disease itself and the severity of the outcome must also be taken into consideration. For example, one may be willing to administer a particular therapy when an outcome is severe (eg, NNT of 50 at 1 year of treatment for cancer). Yet, with the same NNT, one may be reluctant to prescribe an antibiotic to manage a mild upper respiratory infection when it is known that the medication would shorten the symptomatic phase of the illness by only 1 or 2 days (

Figure 4).

**How precise is the estimate of the treatment effect?**

When a result is calculated (CER, EER, RRR, ARR, and NNT), it represents an estimate of some theoretical true value. Ideally, the calculated result should be close to this true value as much as possible. A range of values is used to estimate where the true measure would lie. Normally, this range of values is expressed by a 95% confidence interval (CI) and can be interpreted as: “We are 95% confident that the true value lies within the given interval.”^{7} The narrower the 95% CI, the more precise the result is considered to be. Although the *P* value is a statistical expression of significance (eg, *P*<.05), it does not provide any information on the magnitude of the effect or precision of the results. Therefore, the 95% CI is the most useful mode to express the precision of a treatment effect.