Regarding the statistical analysis used in the May 2014 article by Hasty et al, “Wikipedia vs Peer-Reviewed Medical Literature for Information About the 10 Most Costly Medical Conditions,”

^{1} I believe that use of the McNemar test was wholly inappropriate. The data presented in the study do not meet the statistical considerations required for this analysis. The McNemar test for correlated proportions requires paired observations that could be placed into a 2×2 contingency table. Consider

*Gurzell Table 1*, a hypothetical, counterfactual example that could be used to test whether the assertions found in Wikipedia agree with peer-reviewed sources.

The McNemar test assesses whether there is marginal homogeneity between paired observations

^{2} (eg, no statistically significant difference between dichotomous observations between Wikipedia vs peer-reviewed literature). In that the McNemar test evaluates correlated proportions, nothing is gained when they agree with each other; therefore, the calculation only takes discordant paired observations into account. Given

*Gurzell Table 1*, the McNemar test results in a χ

^{2} test statistic obtained from the following formula:

The collected data in the published study^{1} evaluated whether an assertion made in a Wikipedia article was verified by peer-reviewed sources. This structure constitutes a single dichotomous observation (verified vs not verified) and cannot be used in a McNemar test because one cannot construct the appropriate 2×2 contingency table.

I believe that the authors mistakenly used the McNemar test for the data presented in their article's table 3.^{1} Given that the rows are 2 independent observations from reviewers 1 and 2, applying the above calculation to the data would be inappropriate.

The data organized in the Hasty et al article was organized as shown in

*Gurzell Table 2*. I am able to recreate the

*P* values by assuming that the table is set up for a McNemar test, with the resulting equation:

I am concerned that I am able to replicate 29 of the 30

*P* values reported in Hasty et al's table by incorrectly performing the McNemar test in the way described above (using both GraphPad software and VassarStats online calculator). I believe that the data presented in Hasty et al

^{1} were inappropriately analyzed using the McNemar test, thus leading to nonsensical statistical output.

I respect and agree with the assertion that Wikipedia is not an appropriate medical reference, and I agree with the authors' take-home message that medical professionals and medical students should consult Wikipedia with caution and, when available, use peer-reviewed science.

However, I believe that the study here was incorrectly analyzed and inappropriately published through the same peer-review process that Hasty et al are holding to such high esteem. It is highly unlikely that I would be able to systematically replicate all but 1 (osteoarthritis, “dissimilar data”) of their *P* values by inappropriately entering data points taken from table 3 of their article into the McNemar test.