King's College London

Research portal

Accuracy and Efficiency of Machine Learning-Assisted Risk-of-Bias Assessments in "Real-World" Systematic Reviews: A Noninferiority Randomized Controlled Trial

Research output: Contribution to journalReview articlepeer-review

Anneliese Arno, James Thomas, Byron Wallace, Iain J. Marshall, Joanne E. McKenzie, Julian H. Elliott

Original languageEnglish
Pages (from-to)1001-1009
Number of pages9
JournalAnnals of internal medicine
Volume175
Issue number7
DOIs
Published1 Jul 2022

Bibliographical note

Funding Information: Financial Support: This research was jointly funded by a PhD Studentship from University College London and Monash University. Publisher Copyright: © 2022 American College of Physicians. All rights reserved.

King's Authors

Abstract

Background: Automation is a proposed solution for the increasing difficulty of maintaining up-to-date, high-quality health evidence. Evidence assessing the effectiveness of semiautomated data synthesis, such as risk-of-bias (RoB) assessments, is lacking. Objective: To determine whether RobotReviewer-assisted RoB assessments are noninferior in accuracy and efficiency to assessments conducted with human effort only. Design: Two-group, parallel, noninferiority, randomized trial. (Monash Research Office Project 11256) Setting: Health-focused systematic reviews using Covidence. Participants: Systematic reviewers, who had not previously used RobotReviewer, completing Cochrane RoB assessments between February 2018 and May 2020. Intervention: In the intervention group, reviewers received an RoB form prepopulated by RobotReviewer; in the comparison group, reviewers received a blank form. Studies were assigned in a 1:1 ratio via simple randomization to receive RobotReviewer assistance for either Reviewer 1 or Reviewer 2. Participants were blinded to study allocation before starting work on each RoB form. Measurements: Co-primary outcomes were the accuracy of individual reviewer RoB assessments and the person-time required to complete individual assessments. Domain-level RoB accuracy was a secondary outcome. Results: Of the 15 recruited review teams, 7 completed the trial (145 included studies). Integration of RobotReviewer resulted in noninferior overall RoB assessment accuracy (risk difference, -0.014 [95% CI, -0.093 to 0.065]; intervention group: 88.8% accurate assessments; control group: 90.2% accurate assessments). Data were inconclusive for the persontime outcome (RobotReviewer saved 1.40 minutes [CI, -5.20 to 2.41 minutes]). Limitation: Variability in user behavior and a limited number of assessable reviews led to an imprecise estimate of the time outcome. Conclusion: In health-related systematic reviews, RoB assessments conducted with RobotReviewer assistance are noninferior in accuracy to those conducted without RobotReviewer assistance

View graph of relations

© 2020 King's College London | Strand | London WC2R 2LS | England | United Kingdom | Tel +44 (0)20 7836 5454