Diagnostic codes entered by medical students to document each patient encounter across all rotations as a standard component of their clinical training were used for surveillance in this study. For respiratory syndromes, prior studies have shown the superiority of diagnosis codes over chief complaint alone for surveillance.
5, 35, 36 A study
5 by George Washington University compared surveillance between complaint and diagnostic codes and found despite a potential delay in detection using diagnostic codes, the additional information obtained is more reliable and beneficial than surveillance using chief complaint. Children’s Hospital of Boston also demonstrated that diagnosis was superior to chief complaint for respiratory illness in the pediatric population.
35 The use of diagnosis codes also demonstrates good sensitivity and specificity. A study by the University of Pittsburgh Center for Biomedical Informatics showed the sensitivity of detection of acute respiratory illness using ICD-9 coded diagnoses was 44% with a 97% specificity.
36
The results of our syndromic surveillance COVID-19 validation demonstration are consistent with recent reported studies.
37, 38 In one study,
37 a group at University of California-Los Angeles Health System performed a retrospective observational EMR inpatient and outpatient chart review across three hospitals for patients who presented with a “cough” and for patients hospitalized with acute respiratory failure. The researchers found a statistically-significant excess in symptoms as early as December 22, 2019, and concluded these were most likely early COVID-19 infections. In another recent study, a group from the CDC and the American Red Cross, among others, tested blood bank samples for COVID-19 reactive antibodies.
38 They found samples with reactive antibodies from California, Oregon, and Washington as early as December 13–16, 2019.
Traditional surveillance methods rely on confirmed laboratory testing and physician reporting. This has important health implications, as traditional surveillance may delay the time to response in an evolving outbreak; for example, our results indicated a lag of nine days. Today, approximately 70% of emergency departments and 4,000 health care facilities across the nation transmit electronic health data to the CDC’s BioSense Platform daily. Therefore, this method can be used to monitor infectious disease activity across much larger populations enhancing detection of evolving disease outbreaks. Our results have potential instructional value for medical students, as they illustrate a unique approach that utilizes student generated data for entirely new purposes, as well as a relevant statistical approach for time-dependent surveillance data analysis. Given the annual influenza season and the potential for new and emerging infectious diseases to surface from anywhere on the planet and affect health in America, it is important for future clinicians to be familiar with approaches like the one illustrated here, because it is likely to be valuable for future epidemics and pandemics.
A limitation of this study is that the increase in ILI/SARI activity could have resulted from any number of possible infectious diseases, limiting our ability to conclude causation; however, it is consistent with the timeline of the COVID-19 pandemic and other similar reports.
37, 38 Additionally, surveillance data is subject to confounding factors. Another limitation is the data is representative of the VCOM associated hospitals and clinics throughout Appalachia and Central America (Dominican Republic, Honduras, and El Salvador) and does not represent the entire US population. A strength of the study was that data was accumulated from thousands of patients at many hospitals/clinics. Another advantage was the accuracy of discharge diagnosis codes as compared with chief complaints as this eliminates variability and results in higher outbreak detection sensitivity. Finally, the study was quick and easy to conduct with few resources and can easily be reproduced and enhance collaboration among public health, academic investigators, and industry.
As a result of this study, we are installing a real-time alert for ILI into CREDO, so it can be added to those diseases that are currently monitored by VCOM faculty. This significant, high quality data is amenable to automated computational analysis for generating automated alerts for identifying and tracking the progress of future pandemics. There are a number of advantages to using CREDO data. First, it is real-time, as medical students are continuously logging their clinical training patient encounters. Second, data are collected over a very large number of distributed rural and urban sites, from large, urban metropolitan hospitals to small clinics and individual practices in socioeconomically disadvantaged areas, and thus provides a method for surveilling a large region of the US and the Caribbean area without the need to extract and merge data from a variety of EMR systems. Third, the data is uniformly collected by clinically trained professionals simultaneously across a plurality of indications, thus providing well matched controls. Last, the CREDO system is extensible and if used by most or all health care trainees (medical, physician assistants, nursing, pharmacy students), could ensure coverage across the entire US at even higher data rates, thus improving surveillance performance.