Free
Medical Education  |   August 2016
Correlation Between Scores on Weekly Quizzes and Performance on the Annual Resident In-Service Examination
Author Notes
  • From the Departments of Emergency Medicine (Drs Wagner and Ashurst) and Research (Mr Simunich) at Duke Lifepoint Memorial Medical Center in Brentwood, Tennessee, and the Department of Emergency Medicine at the DLP Conemaugh Memorial Medical Center in Johnstown, Pennsylvania (Dr Cooney). 
  • Support: This study was funded by the Lake Erie College of Osteopathic Medicine Research Support Grant. 
  •  *Address correspondence to John V. Ashurst, DO, MSc, Department of Emergency Medicine, Duke Lifepoint Memorial Medical Center, 330 Seven Springs Way, Brentwood, TN 37027-5098. E-mail: ashurst.john.32@gmail.com
     
Article Information
Medical Education / Graduate Medical Education
Medical Education   |   August 2016
Correlation Between Scores on Weekly Quizzes and Performance on the Annual Resident In-Service Examination
The Journal of the American Osteopathic Association, August 2016, Vol. 116, 530-534. doi:10.7556/jaoa.2016.106
The Journal of the American Osteopathic Association, August 2016, Vol. 116, 530-534. doi:10.7556/jaoa.2016.106
Abstract

Context: Medical residency education relies heavily on the use of written and oral testing as a means of assessing a learner’s knowledge acquisition. In the United States, osteopathic emergency medicine residents take an annual specialty-based resident in-service examination (RISE) for this purpose. Their performance on the RISE helps direct educators’ approach to teaching and training.

Objectives: To determine the correlative strength of residents’ cumulative performance on a series of weekly in-house quizzes with their performance on the RISE.

Methods: In this prospective study, emergency medicine residents took a series of 15 quizzes between August 2013 and January 2014. The quizzes were administered using slides integrated with an audience-response system. Quizzes comprised questions gathered from various question banks and commercial test review resources specific to the specialty of emergency medicine. Effort was made to select questions covering topics tested on the RISE. Scores from each of the quizzes were recorded, and these data were analyzed for correlation with residents’ scores on the RISE.

Results: Sixteen emergency medicine residents from all 4 postgraduate years participated in the study. For various reasons (vacation, illness, away rotations), not all 16 residents participated in each quiz. The mean participation rate over all 15 quizzes was 76.7%, with a mean quiz score of 57.8%. A correlation analysis was conducted between the achieved RISE score and the mean quiz score (excluding any quizzes not taken). Graphical analysis revealed a sufficiently linear relationship between the 2 variables, with no outliers. Both variables were normally distributed, as assessed by the Shapiro-Wilks test (P>.05). A strong positive correlation was found between RISE score and mean quiz score (r[14]=0.75; P=.001), with the mean quiz score over the quizzes taken explaining about 57% of the variance in the achieved RISE score.

Conclusions: The results of this study imply that performance on weekly didactic quizzes may be strongly predictive of RISE performance and as such tracking these data may provide insight to educators and learners as to the most effective direction of their educational efforts.

This Medical Education section represents a new collaboration between the JAOA and the American Association of Colleges of Osteopathic Medicine (AACOM) to recruit, peer review, edit, and distribute articles through the JAOA on osteopathic medical education research and other scholarly issues related to medical education.

Program directors and faculty of medical residency programs are charged with the responsibility of teaching and training medical school graduates to be cognitively, professionally, and technically competent enough to eventually practice medicine independently. In accomplishing this task, various means are used to deliver information to residents and subsequently verify that it is being understood and applied. Assessing the effectiveness of specific educational methods is therefore essential to ensure the highest level of training possible.1-7 Perhaps the most relied-on means of assessing a learners’ educational gains is through written and oral testing. Early identification of knowledge deficits and subsequent intervention can lead to improved performance on future high-stakes examinations, such as the resident in-service examination (RISE) and licensing and board examinations. 
Many elements, including medical school attended, age at residency matriculation, residency size, and the United States Medical Licensing Examination scores, have been studied to determine their predictive value in board examination performance.8-10 Assessing the efficiency and overall effectiveness of methods designed to enhance medical knowledge acquisition is critical to identifying areas in which improvements are needed in this effort. In one study,8 195 internal medicine residents took the RISE 421 times over 4 years. Their scores were positively affected by both conference attendance and independent study of online resources. A study of factors common to residents passing the American Board of Surgery examinations on the first attempt identified statistically significant factors, including United States Medical Licensing Examination scores as well as RISE scores.11 Resident in-service examinations themselves can, of course, be useful in predicting success on the final qualifying board examination and can be used as a means of identifying residents at risk of not passing the qualifying examination.1,12 
With RISE scores being shown to ultimately correlate well with residents’ early success on qualifying board examinations, it follows that factors affecting RISE performance may also affect qualifying examination performance.1,11-13 Earlier identification of residents could be made possible through other assessment methods, such as more frequent written examinations or quizzes, and would enable focused, faculty corrective intervention. In this study, we sought to determine the correlative strength between cumulative scores on weekly quizzes and RISE scores for a group of emergency medicine residents. 
Methods
Setting and Population
A prospective study was implemented after institutional review board approval. This level-1 regional resource trauma center and teaching hospital provides medical residency programs in emergency medicine, family medicine, internal medicine, and surgery. The emergency medicine residency is a 4-year osteopathic residency with a total of 17 residents at the time of the study. All emergency medicine residents regardless of training level were eligible to participate. 
Procedure
From August 2013 through January 2014, 15 quizzes were administered to participating emergency medicine residents. The principal investigator (B.J.W.) constructed a quiz each week using questions obtained from commercially available review resources specific to the specialty of emergency medicine and pertinent to topics covered on the RISE. Each of 14 quizzes consisted of 10 questions, and 1 quiz had 8 questions. The quiz presentations were made using PowerPoint (Microsoft Corporation) and integrated into TurningPoint (Turning Technologies) audience response software. 
Each participant was anonymously assigned a clicker compatible with TurningPoint polling software. Clickers enabled recording of participant answers in real-time. To ensure that the data collected from each clicker were consistently matched to the same participant, the clickers were labeled with the participants’ names and stored in a secure location. Each clicker was also identified by a unique 6-character alpha-numeric electronic signature. Researchers were blinded to the clicker-participant pairing; only the electronic signatures of the clickers were included in the data collection. 
At the beginning of each weekly didactic session, the clickers were taken from their secure location and given to participants. The presentation was opened after all participants for that week were present. For each question, participants were given 90 seconds to read the question stem and click the button that corresponded to their answer (A, B, C, or D). A countdown timer was visible by all participants on the PowerPoint slide alerting them to the remaining time. The TurningPoint software automatically stopped receiving input from participants once the allotted time had expired. More than 90 seconds were allotted if additional material pertinent to the question required review (eg, images, electrocardiograph results). At the conclusion of each quiz, individual response data were saved and the clickers returned to the secure location. 
Participants were required to be present to participate in the quizzes. Owing to factors such as vacations, away rotations, and illness, not all participants were present for each quiz. 
Statistical Analysis
Summary calculations for each participant, accounting for only the attended quizzes for each participant, were used to produce overall statistics. The Shapiro-Wilk test was used to assess for distribution, and the Pearson correlation was used to assess the RISE score and the percentage of correct answers for each participant. An α value of less than .05 was used as the criterion for statistical significance. SPSS version 19 (IBM) was used for all statistical analyses. 
Results
Sixteen emergency medicine residents met the inclusion criterion and gave their verbal voluntary consent to participate in the study. No resident participated in all 15 quizzes. The mean percentage of participation over all 15 quizzes was 76.7%, with a mean quiz score of 57.8%. The mean RISE score was 59.8%. 
A Pearson correlation analysis was conducted between the RISE score achieved by each participant and the mean quiz score over all quizzes taken per participant (excluding any quizzes not taken). Graphical analysis revealed a sufficiently linear relationship between the 2 variables, with no outliers. Both variables were normally distributed. A strong positive correlation existed between RISE score and mean quiz score (r[14]=0.75; P=.001), with the mean quiz score explaining about 56.7% of the variance in the achieved RISE score. A linear regression established that the mean quiz score could predict the RISE score (F1,14=18.29; P=.001). 
Discussion
Mean scores on weekly quizzes showed correlation with performance on the RISE. The residents with the highest cumulative weekly quiz scores achieved the highest scores on the RISE, and the residents with the poorest cumulative scores on the weekly quizzes had the poorest performance on the RISE. However, the small sample size limits the widespread applicability of the data. 
The findings in the current study demonstrate that the administration of weekly quizzes can provide information that may help teaching faculty identify residents at risk for poor performance on the RISE. The findings may also allow for the beginning of a paradigm shift in assessment validity. 
Previously described by Samuel J. Messick, PhD, in 1989, assessment validity was defined as “the degree to which evidence and theory support the interpretations of test scores for proposed uses of tests.”14,15 However, the concept of consequence validity has emerged in the medical education literature.16 Consequence validity examines the effects of both clinical and educational assessments on those being assessed.16 Instead of asking, “Are we measuring what we think we are measuring?” consequence validity asks, “Does the activity of measuring and the subsequent interpretation and application of scores achieve our desired results with few negative side effects?”16 In the current study, the intervention was the weekly quizzes, and the application of the intervention was that weekly quizzes may alter RISE scores. 
Other research has sought to identify historical factors that could be reliably used to predict performance on future high-stakes examinations.10,11,13,17 Although no historical factor is foolproof, some factors can provide useful information in anticipating residents’ future educational success.10,11,13,17 Weekly quizzes administered using the protocol in the current study may be useful for other residency programs in the prediction of performance on the RISE. Early identification of residents at risk for poor performance on the RISE and certifying examinations may allow for timely educational intervention. 
The response to intervention process is loosely described as … 

a process in which students are provided quality instruction, their progress is monitored, those who do not respond appropriately are provided additional instruction and their progress is monitored, and those who continue to not respond appropriately are considered for special education services.18

 
This process is currently studied in grade school but has yet to gain momentum in medical education. Future research needs to assess the utility of response to intervention and in graduate medical education. 
It is incumbent upon the educators of future physicians to ensure that residents obtain the highest level of knowledge and ability that will be needed to safely and competently care for their patients. In doing so, it is imperative that these educators use effective and proven means in accomplishing these goals. Continual monitoring of residents’ skills and knowledge acquisition ensures progress toward critical goals. Although certifying examinations provide the ultimate measure of trainee competence, they occur once residency training is completed. The RISE is a means of assessing resident progress annualy; however, more frequent gauging of resident progress may be attainable through weekly quizzes. 
Limitations
The small sample size imposed several limitations on the validity of the data obtained. It is unclear whether the data obtained can be applied to larger populations of residents or across specialties or programs within the same specialty. Casual observation of other specialty training programs within the study facility noted that didactic lectures and educational sessions vary in how they are carried out. These differences may impose difficulty on the administration of frequent testing such as that carried out in the current study. 
In addition, the weight of each subject category tested did not necessarily reflect the corresponding weight of each subject on the RISE. In other words, a category such as “pediatric emergencies” weighted differently in this study than in the RISE could affect outcomes. A more accurate reflection of RISE questions could have theoretically influenced the results of this study because residents may have had differing levels of competence in each category. A general breakdown of tested subject categories can be found in various review materials. Although some effort was made to comply with these sources, it is difficult if not impossible to predict the exact breakdown and weight of each category on the actual RISE. 
Limitations to the study also included the setting of the quizzes, which differs greatly from the RISE setting, which is a formal, controlled environment. Steps were taken to ensure that minimal interaction took place between residents during quiz administration. However, the nature of the study setting prevented 100% compliance with this goal. Many factors related to testing venue were different, but the casual setting in which the quizzes were administered may have influenced quiz scores. 
Future Directions for Research
A similar study but with a larger population and a cross-program and cross-specialty design would address questions about the feasibility of incorporating weekly quizzes into the curricula of different specialties and within different programs of the same specialty. In future studies, effort should be made to ensure that the breakdown of question categories on weekly quizzes reflects the breakdown of question categories on the RISE as closely as possible. Close cooperation with the individuals who produce the RISE could enable a closer correlation between the content of weekly quizzes and that of the RISE. The effect of this effort would be the acquisition of data of higher fidelity and elimination of a significant limitation. 
Conclusion
Whether weekly quiz administration will ultimately prove to be a reliable means of predicting RISE scores has yet to be definitively answered. Although a small sample size places restraints on the interpretation and application of data, the results of the present study imply that tracking performance on weekly didactic quizzes may be strongly predictive of RISE performance. Furthermore, quiz performance may provide insight to educators and learners as to the most effective direction of their educational efforts. Future research in this area should involve a large sample size and a cross-program and cross-specialty design. 
Aeder L, Fogel J, Schaeffer H. Pediatric board review course for residents “at risk.” Clin Pediatr (Phila). 2010;49(5):450-456. doi:10.1177/0009922809352679. [CrossRef] [PubMed]
Shokar GS. The effects of an educational intervention for “at-risk” residents to improve their scores on the in-training exam. Fam Med. 2003;35(6):414-417. [PubMed]
Davis DA, Thomson MA, Oxman AD, Haynes RB. Changing physician performance: a systematic review of the effect of continuing medical education strategies. JAMA. 1995;274(9):700-705. [CrossRef] [PubMed]
Davis D, O’Brien MA, Freemantle N, Wolf FM, Mazmanian P, Taylor-Vaisey A. Impact of formal continuing medical education: do conferences, workshops, rounds, and other traditional continuing education activities change physician behavior or health care outcomes? JAMA. 1999;282(9):867-874. [CrossRef] [PubMed]
Haidet P, Morgan RO, O’Malley K, Moran BJ, Richards BF. A controlled trial of active versus passive learning strategies in a large group setting. Adv Health Sci Educ Theory Pract. 2004;9(1):15-27. [CrossRef] [PubMed]
Haidet P, O’Malley KJ, Richards B. An initial experience with “team learning” in medical education. Acad Med. 2002;77(1):40-44. [CrossRef] [PubMed]
Long DM. Competency-based residency training: the next advance in graduate medical education. Acad Med. 2000;75(12):1178-1183. [CrossRef] [PubMed]
McDonald FS, Zeger SL, Kolars JC. Factors associated with medical knowledge acquisition during internal medicine residency. J Gen Intern Med. 2007;22(7):962-968. [CrossRef] [PubMed]
Falcone JL, Gonzalo JD. Relationship between internal medicine program board examination pass rates, accreditation standards, and program size. Int J Med Educ. 2014;5:11-14. doi:10.5116/ijme.52c5.6602. [CrossRef] [PubMed]
McCaskill QE, Kirk JJ, Barata DM, Wludyka PS, Zenni EA, Chiu TT. USMLE step 1 scores as a significant predictor of future board passage in pediatrics. Ambul Pediatr. 2007;7(2):192-195. [CrossRef] [PubMed]
Shellito JL, Osland JS, Helmer SD, Chang FC. American Board of Surgery examinations: can we identify surgery residency applicants and residents who will pass the examinations on the first attempt? Am J Surg. 2010;199(2):216-222. doi:10.1016/j.amjsurg.2009.03.006. [CrossRef] [PubMed]
Garvin PJ, Kaminski DL. Significance of the in-training examination in a surgical residency program. Surgery. 1984;96(1):109-113. [PubMed]
Perez JAJr, Greer S. Correlation of United States Medical Licensing Examination and internal medicine in-training examination performance. Adv Health Sci Educ Theory Pract. 2009;14(5):753-758. doi:10.1007/s10459-009-9158-2. [CrossRef] [PubMed]
Messick S. Validity. In: Linn RL, ed. Educational Measurement. 3rd ed. New York, NY: American Council on Education and Macmillan; 1989:13-103.
American Educational Research Association, American Psychological Association, National Council on Measurement in Education. Validity. In: Standards for Educational and Psychological Testing. Washington, DC: American Educational Research Association; 2014:11-31.
Cook DA and Linberry M. Consequences validity evidence: evaluating the impact of educational assessments. Acad Med. 2016;91(6):785-795. doi:10.1097/ACM.0000000000001114. [CrossRef] [PubMed]
Kanna B, Gu Y, Akhuetie J, Dimitrov V. Predicting performance using background characteristics of international medical graduates in an inner-city university-affiliated internal medicine residency training program. BMC Med Educ. 2009;13;9:42. doi:10.1186/1472-6920-9-42. [CrossRef]
Fuchs D, Mock D, Morgan PL, Young CL. Responsiveness-to- intervention: Definitions, evidence, and implications for the learning disabilities construct. Learn Disabil Res Pract. 2003;18(3):157-171. doi:10.1111/1540-5826.00072. [CrossRef]