Search results
Found 12223 matches for
The Nuffield Department of Surgical Sciences is the academic department of surgery at the University of Oxford, and hosts a multidisciplinary team of senior clinical academic surgeons, senior scientists, junior clinicians and scientists in training.
Late Treatment With Autologous Expanded Regulatory T-cell Therapy After Alemtuzumab Induction Is Safe and Facilitates Immunosuppression Minimization in Living Donor Renal Transplantation.
BACKGROUND: The TWO Study (Transplantation Without Overimmunosuppression) aimed to investigate a novel approach to regulatory T-cell (Treg) therapy in renal transplant patients, using a delayed infusion protocol at 6 mo posttransplant to promote a Treg-skewed lymphocyte repopulation after alemtuzumab induction. We hypothesized that this would allow safe weaning of immunosuppression to tacrolimus alone. The COVID-19 pandemic led to the suspension of alemtuzumab use, and therefore, we report the unique cohort of 7 patients who underwent the original randomized controlled trial protocol. This study presents a unique insight into Treg therapy combined with alemtuzumab and is therefore an important proof of concept for studies in other diseases that are considering lymphodepletion. METHODS: Living donor kidney transplant recipients were randomized to receive autologous polyclonal Treg at week 26 posttransplantation, coupled with weaning doses of tacrolimus, (Treg therapy arm) or standard immunosuppression alone (tacrolimus and mycophenolate mofetil). Primary outcomes were patient survival and rejection-free survival. RESULTS: Successful cell manufacturing and cryopreservation until the 6-mo infusion were achieved. Patient and transplant survival was 100%. Acute rejection-free survival was 100% in the Treg-treated group at 18 mo after transplantation. Although alemtuzumab caused a profound depletion of all lymphocytes, including Treg, after cell therapy infusion, there was a transient increase in peripheral Treg numbers. CONCLUSIONS: The study establishes that delayed autologous Treg therapy is both feasible and safe, even 12 mo after cell production. The findings present a new treatment protocol for Treg therapy, potentially expanding its applications to other indications.
Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI.
A growing number of artificial intelligence (AI)-based clinical decision support systems are showing promising performance in preclinical, in silico evaluation, but few have yet demonstrated real benefit to patient care. Early-stage clinical evaluation is important to assess an AI system's actual clinical performance at small scale, ensure its safety, evaluate the human factors surrounding its use and pave the way to further large-scale trials. However, the reporting of these early studies remains inadequate. The present statement provides a multi-stakeholder, consensus-based reporting guideline for the Developmental and Exploratory Clinical Investigations of DEcision support systems driven by Artificial Intelligence (DECIDE-AI). We conducted a two-round, modified Delphi process to collect and analyze expert opinion on the reporting of early clinical evaluation of AI systems. Experts were recruited from 20 pre-defined stakeholder categories. The final composition and wording of the guideline was determined at a virtual consensus meeting. The checklist and the Explanation & Elaboration (E&E) sections were refined based on feedback from a qualitative evaluation process. In total, 123 experts participated in the first round of Delphi, 138 in the second round, 16 in the consensus meeting and 16 in the qualitative evaluation. The DECIDE-AI reporting guideline comprises 17 AI-specific reporting items (made of 28 subitems) and ten generic reporting items, with an E&E paragraph provided for each. Through consultation and consensus with a range of stakeholders, we developed a guideline comprising key items that should be reported in early-stage clinical studies of AI-based decision support systems in healthcare. By providing an actionable checklist of minimal reporting items, the DECIDE-AI guideline will facilitate the appraisal of these studies and replicability of their findings.
The IDEAL framework for surgical robotics: development, comparative evaluation and long-term monitoring.
The next generation of surgical robotics is poised to disrupt healthcare systems worldwide, requiring new frameworks for evaluation. However, evaluation during a surgical robot's development is challenging due to their complex evolving nature, potential for wider system disruption and integration with complementary technologies like artificial intelligence. Comparative clinical studies require attention to intervention context, learning curves and standardized outcomes. Long-term monitoring needs to transition toward collaborative, transparent and inclusive consortiums for real-world data collection. Here, the Idea, Development, Exploration, Assessment and Long-term monitoring (IDEAL) Robotics Colloquium proposes recommendations for evaluation during development, comparative study and clinical monitoring of surgical robots-providing practical recommendations for developers, clinicians, patients and healthcare systems. Multiple perspectives are considered, including economics, surgical training, human factors, ethics, patient perspectives and sustainability. Further work is needed on standardized metrics, health economic assessment models and global applicability of recommendations.
Holistic Human-Serving Digitization of Health Care Needs Integrated Automated System-Level Assessment Tools.
Digital health tools, platforms, and artificial intelligence- or machine learning-based clinical decision support systems are increasingly part of health delivery approaches, with an ever-greater degree of system interaction. Critical to the successful deployment of these tools is their functional integration into existing clinical routines and workflows. This depends on system interoperability and on intuitive and safe user interface design. The importance of minimizing emergent workflow stress through human factors research and purposeful design for integration cannot be overstated. Usability of tools in practice is as important as algorithm quality. Regulatory and health technology assessment frameworks recognize the importance of these factors to a certain extent, but their focus remains mainly on the individual product rather than on emergent system and workflow effects. The measurement of performance and user experience has so far been performed in ad hoc, nonstandardized ways by individual actors using their own evaluation approaches. We propose that a standard framework for system-level and holistic evaluation could be built into interacting digital systems to enable systematic and standardized system-wide, multiproduct, postmarket surveillance and technology assessment. Such a system could be made available to developers through regulatory or assessment bodies as an application programming interface and could be a requirement for digital tool certification, just as interoperability is. This would enable health systems and tool developers to collect system-level data directly from real device use cases, enabling the controlled and safe delivery of systematic quality assessment or improvement studies suitable for the complexity and interconnectedness of clinical workflows using developing digital health technologies.
Association of Clinician Diagnostic Performance With Machine Learning-Based Decision Support Systems: A Systematic Review.
IMPORTANCE: An increasing number of machine learning (ML)-based clinical decision support systems (CDSSs) are described in the medical literature, but this research focuses almost entirely on comparing CDSS directly with clinicians (human vs computer). Little is known about the outcomes of these systems when used as adjuncts to human decision-making (human vs human with computer). OBJECTIVES: To conduct a systematic review to investigate the association between the interactive use of ML-based diagnostic CDSSs and clinician performance and to examine the extent of the CDSSs' human factors evaluation. EVIDENCE REVIEW: A search of MEDLINE, Embase, PsycINFO, and grey literature was conducted for the period between January 1, 2010, and May 31, 2019. Peer-reviewed studies published in English comparing human clinician performance with and without interactive use of an ML-based diagnostic CDSSs were included. All metrics used to assess human performance were considered as outcomes. The risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) and Risk of Bias in Non-Randomised Studies-Intervention (ROBINS-I). Narrative summaries were produced for the main outcomes. Given the heterogeneity of medical conditions, outcomes of interest, and evaluation metrics, no meta-analysis was performed. FINDINGS: A total of 8112 studies were initially retrieved and 5154 abstracts were screened; of these, 37 studies met the inclusion criteria. The median number of participating clinicians was 4 (interquartile range, 3-8). Of the 107 results that reported statistical significance, 54 (50%) were increased by the use of CDSSs, 4 (4%) were decreased, and 49 (46%) showed no change or an unclear change. In the subgroup of studies carried out in representative clinical settings, no association between the use of ML-based diagnostic CDSSs and improved clinician performance could be observed. Interobserver agreement was the commonly reported outcome whose change was the most strongly associated with CDSS use. Four studies (11%) reported on user feedback, and, in all but 1 case, clinicians decided to override at least some of the algorithms' recommendations. Twenty-eight studies (76%) were rated as having a high risk of bias in at least 1 of the 4 QUADAS-2 core domains, and 6 studies (16%) were considered to be at serious or critical risk of bias using ROBINS-I. CONCLUSIONS AND RELEVANCE: This systematic review found only sparse evidence that the use of ML-based CDSSs is associated with improved clinician diagnostic performance. Most studies had a low number of participants, were at high or unclear risk of bias, and showed little or no consideration for human factors. Caution should be exercised when estimating the current potential of ML to improve human diagnostic performance, and more comprehensive evaluation should be conducted before deploying ML-based CDSSs in clinical settings. The results highlight the importance of considering supported human decisions as end points rather than merely the stand-alone CDSSs outputs.
The IDEAL Reporting Guidelines: A Delphi Consensus Statement Stage Specific Recommendations for Reporting the Evaluation of Surgical Innovation.
OBJECTIVE: The aim of this study was to define reporting standards for IDEAL format studies. BACKGROUND: The IDEAL Framework and Recommendations establish an integrated pathway for evaluation of new surgical techniques and complex therapeutic technologies. However guidance on implementation has been incomplete, and incorrect use is commonly seen. We describe the consensus development of reporting guidelines for the IDEAL stages, and plans for their dissemination and evaluation. METHODS: Using the EQUATOR Network recommendations, participants with knowledge of IDEAL were surveyed to determine which IDEAL stages needed reporting guidelines. Draft checklists for stages 1, 2a, 2b, and 4 were subsequently developed by 3 researchers (N.B., A.H., P.M.), and revised through a 2-round Delphi consensus process. A final consensus teleconference resolved outstanding disagreements and clarified wording for checklist items. RESULTS: Sixty-one participants completed the initial survey, a clear majority indicating that new reporting guidelines were needed for IDEAL Stage 1 (69.5%), Stage 2a (78%), Stage 2b (74.6%), and Stage 4 (66%). A proposed set of checklists was modified by survey participants in 2 online Delphi rounds (n = 54 and n = 47, respectively), resulting in a penultimate checklist for each stage. Fourteen expert working group members finalized the checklist items and successfully resolved any outstanding areas without agreement on a consensus call. CONCLUSIONS: Participants familiar with IDEAL called for reporting guidelines for studies in all IDEAL stages except stage 3. The checklists developed have the potential to improve standards of reporting and thereby advance the quality of research on surgery and complex interventions and technologies, but require further evaluation in use.
Early development of decision support systems based on artificial intelligence: an application to postoperative complications and a cross-specialty reporting guideline for early-stage clinical evaluation
Background: Complications after major surgery occur in a similar manner internationally but the success of response process in preventing death varies widely depending on speed and appropriateness. Artificial intelligence (AI) offers new opportunities to provide support to the decision making of clinicians in this stressful situation when uncertainty is high. However, few AI systems have been robustly and successfully tested in real-world clinical settings. Whilst preparing to develop an AI decision support algorithm and planning to evaluate it in real-world settings, a lack of appropriate guidance on reporting early clinical evaluation of such systems was identified. Objectives: The objectives of this work were twofold: i) to develop a prototype of AI system to improve the management of postoperative complications; and ii) to understand expert consensus on reporting standards for early-stage evaluation of AI systems in live clinical settings. Methods: I conducted and thematically analysed interviews with clinicians to identify their main challenges and support needs when managing postoperative complications. I then systematically reviewed the literature on the impact of AI-based decision support systems on clinicians’ diagnostic performance. A model based on unsupervised clustering and providing prescription recommendations was developed, optimised, and tested on an internal hold out dataset. Finally, I conducted a Delphi process, to reach expert consensus on minimum reporting standards for the early-stage clinical evaluation of AI systems in live clinical settings. Results: 12 interviews were conducted with junior and senior clinicians identifying 54 themes about challenges, common errors, strategies, and support needs when managing postoperative complications. 37 studies were included in the systematic review, which found no robust evidence of a positive association between the use of AI decision support systems and improved clinician diagnostic performance. The developed algorithm showed no improvement in recall at position ten compared to a list of the most common prescriptions in the study population. When considering the prevalence of the individual prescriptions, the algorithm showed a 12% relative increase in performance compared to the same baseline. 151 experts participated in the Delphi study, representing 18 countries and 20 stakeholder groups. The final DECIDE-AI checklist comprises 27 items, accompanied by Explanation & Elaboration sections for each. Conclusion: The proposed algorithm offers a proof of concept for an AI system to improve the management of postoperative complications. However, it needs further development and evaluation before claiming clinical utility. The DECIDE-AI guideline provides a practicable checklist for researchers reporting on the implementation of AI decision support systems in clinical settings, and merits future iterative evaluation-update cycles in practice.
Artificial intelligence in medical device software and high-risk medical devices - a review of definitions, expert recommendations and regulatory initiatives.
INTRODUCTION: Artificial intelligence (AI) encompasses a wide range of algorithms with risks when used to support decisions about diagnosis or treatment, so professional and regulatory bodies are recommending how they should be managed. AREAS COVERED: AI systems may qualify as standalone medical device software (MDSW) or be embedded within a medical device. Within the European Union (EU) AI software must undergo a conformity assessment procedure to be approved as a medical device. The draft EU Regulation on AI proposes rules that will apply across industry sectors, while for devices the Medical Device Regulation also applies. In the CORE-MD project (Coordinating Research and Evidence for Medical Devices), we have surveyed definitions and summarize initiatives made by professional consensus groups, regulators, and standardization bodies. EXPERT OPINION: The level of clinical evidence required should be determined according to each application and to legal and methodological factors that contribute to risk, including accountability, transparency, and interpretability. EU guidance for MDSW based on international recommendations does not yet describe the clinical evidence needed for medical AI software. Regulators, notified bodies, manufacturers, clinicians and patients would all benefit from common standards for the clinical evaluation of high-risk AI applications and transparency of their evidence and performance.
Examining the empirical evidence for IDEAL 2b studies: the effects of preceding prospective collaborative cohort studies on the quality and impact of subsequent randomized controlled trials of surgical innovations - protocol for a systematic review and case-control analysis.
Randomized controlled trials (RCTs) in surgery face methodological challenges, which often result in low quality or failed trials. The Idea, Development, Exploration, Assessment and Long-term (IDEAL) framework proposes preliminary prospective collaborative cohort studies with specific properties (IDEAL 2b studies) to increase the quality and feasibility of surgical RCTs. Little empirical evidence exists for this proposition, and specifically designed 2b studies are currently uncommon. Prospective collaborative cohort studies are, however, relatively common, and might provide similar benefits. We will, therefore, assess the association between prior 'IDEAL 2b-like' cohort studies and the quality and impact of surgical RCTs. We propose a systematic review using two parallel case-control analyses, with surgical RCTs as subjects and study quality and journal impact factor (IF) as the outcomes of interest. We will search for surgical RCTs published between 2015 and 2019 and and prior prospective collaborative cohort studies authored by any of the RCT investigators. RCTs will be categorized into cases or controls by (1) journal (IF ≥or <5) and (2) study quality (PEDro score ≥or < 7). The case/control OR of exposure to a prior '2b like' study will be calculated independently for quality and impact. Cases will be matched 1: 1 with controls by year of publication, and confounding by peer-reviewed funding, author academic affiliation and trial protocol registration will be examined using multiple logistic regression analysis. This study will examine whether preparatory IDEAL 2b-like studies are associated with higher quality and impact of subsequent RCTs.
Automated operative workflow analysis of endoscopic pituitary surgery using machine learning: development and preclinical evaluation (IDEAL stage 0).
OBJECTIVE: Surgical workflow analysis involves systematically breaking down operations into key phases and steps. Automatic analysis of this workflow has potential uses for surgical training, preoperative planning, and outcome prediction. Recent advances in machine learning (ML) and computer vision have allowed accurate automated workflow analysis of operative videos. In this Idea, Development, Exploration, Assessment, Long-term study (IDEAL) stage 0 study, the authors sought to use Touch Surgery for the development and validation of an ML-powered analysis of phases and steps in the endoscopic transsphenoidal approach (eTSA) for pituitary adenoma resection, a first for neurosurgery. METHODS: The surgical phases and steps of 50 anonymized eTSA operative videos were labeled by expert surgeons. Forty videos were used to train a combined convolutional and recurrent neural network model by Touch Surgery. Ten videos were used for model evaluation (accuracy, F1 score), comparing the phase and step recognition of surgeons to the automatic detection of the ML model. RESULTS: The longest phase was the sellar phase (median 28 minutes), followed by the nasal phase (median 22 minutes) and the closure phase (median 14 minutes). The longest steps were step 5 (tumor identification and excision, median 17 minutes); step 3 (posterior septectomy and removal of sphenoid septations, median 14 minutes); and step 4 (anterior sellar wall removal, median 10 minutes). There were substantial variations within the recorded procedures in terms of video appearances, step duration, and step order, with only 50% of videos containing all 7 steps performed sequentially in numerical order. Despite this, the model was able to output accurate recognition of surgical phases (91% accuracy, 90% F1 score) and steps (76% accuracy, 75% F1 score). CONCLUSIONS: In this IDEAL stage 0 study, ML techniques have been developed to automatically analyze operative videos of eTSA pituitary surgery. This technology has previously been shown to be acceptable to neurosurgical teams and patients. ML-based surgical workflow analysis has numerous potential uses-such as education (e.g., automatic indexing of contemporary operative videos for teaching), improved operative efficiency (e.g., orchestrating the entire surgical team to a common workflow), and improved patient outcomes (e.g., comparison of surgical techniques or early detection of adverse events). Future directions include the real-time integration of Touch Surgery into the live operative environment as an IDEAL stage 1 (first-in-human) study, and further development of underpinning ML models using larger data sets.
Beyond the RCT: When are Randomized Trials Unnecessary for New Therapeutic Devices, and What Should We Do Instead?
OBJECTIVE: The aim of this study was to develop an evidence-based framework for evaluation of therapeutic devices, based on ethical principles and clinical evidence considerations. SUMMARY BACKGROUND DATA: Nearly all medical products which do not work solely through chemical action are regulated as medical devices. Their huge range of purposes, mechanisms of action and risks pose challenges for regulation. High-profile implantable device failures have fuelled concerns about the level of clinical evidence needed for market approval. Calls for more rigorous evaluation lack clarity about what kind of evaluation is appropriate, and are commonly interpreted as meaning more randomized controlled trials (RCTs). These are valuable where devices are genuinely new and claim to offer measurable therapeutic benefits. Where this is not the case, RCTs may be inappropriate and wasteful. METHODS: Starting with a set of ethical principles and basic precepts of clinical epidemiology, we developed a sequential decision-making algorithm for identifying when an RCT should be performed to evaluate new therapeutic devices, and when other methods, such as observational study designs and registry-based approaches, are acceptable. RESULTS: The algorithm clearly defines a group of devices where an RCT is deemed necessary, and the associated framework indicates that an IDEAL 2b study should be the default clinical evaluation method where it is not. CONCLUSIONS: The algorithm and recommendations are based on the principles of the IDEAL-D framework for medical device evaluation and appear eminently practicable. Their use would create a safer system for monitoring innovation, and facilitate more rapid detection of potential hazards to patients and the public.
Development and validation of early warning score systems for COVID-19 patients.
COVID-19 is a major, urgent, and ongoing threat to global health. Globally more than 24 million have been infected and the disease has claimed more than a million lives as of November 2020. Predicting which patients will need respiratory support is important to guiding individual patient treatment and also to ensuring sufficient resources are available. The ability of six common Early Warning Scores (EWS) to identify respiratory deterioration defined as the need for advanced respiratory support (high-flow nasal oxygen, continuous positive airways pressure, non-invasive ventilation, intubation) within a prediction window of 24 h is evaluated. It is shown that these scores perform sub-optimally at this specific task. Therefore, an alternative EWS based on the Gradient Boosting Trees (GBT) algorithm is developed that is able to predict deterioration within the next 24 h with high AUROC 94% and an accuracy, sensitivity, and specificity of 70%, 96%, 70%, respectively. The GBT model outperformed the best EWS (LDTEWS:NEWS), increasing the AUROC by 14%. Our GBT model makes the prediction based on the current and baseline measures of routinely available vital signs and blood tests.
Multivariate time-series analysis of biomarkers from a dengue cohort offers new approaches for diagnosis and prognosis.
Dengue is a major public health problem worldwide with distinct clinical manifestations: an acute presentation (dengue fever, DF) similar to other febrile illnesses (OFI) and a more severe, life-threatening form (severe dengue, SD). Due to nonspecific clinical presentation during the early phase of dengue infection, differentiating DF from OFI has remained a challenge, and current methods to determine severity of dengue remain poor early predictors. We present a prospective clinical cohort study conducted in Caracas, Venezuela from 2001-2005, designed to determine whether clinical and hematological parameters could distinguish DF from OFI, and identify early prognostic biomarkers of SD. From 204 enrolled suspected dengue patients, there were 111 confirmed dengue cases. Piecewise mixed effects regression and nonparametric statistics were used to analyze longitudinal records. Decreased serum albumin and fibrinogen along with increased D-dimer, thrombin-antithrombin complex, activated partial thromboplastin time and thrombin time were prognostic of SD on the day of defervescence. In the febrile phase, the day-to-day rates of change in serum albumin and fibrinogen concentration, along with platelet counts, were significantly decreased in dengue patients compared to OFI, while the day-to-day rates of change of lymphocytes (%) and thrombin time were increased. In dengue patients, the absolute lymphocytes to neutrophils ratio showed specific temporal increase, enabling classification of dengue patients entering the critical phase with an area under the ROC curve of 0.79. Secondary dengue patients had elongation of Thrombin time compared to primary cases while the D-dimer formation (fibrinolysis marker) remained always lower for secondary compared to primary cases. Based on partial analysis of 31 viral complete genomes, a high frequency of C-to-T transitions located at the third codon position was observed, suggesting deamination events with five major hot spots of amino acid polymorphic sites outside in non-structural proteins. No association of severe outcome was statistically significant for any of the five major polymorphic sites found. This study offers an improved understanding of dengue hemostasis and a novel way of approaching dengue diagnosis and disease prognosis using piecewise mixed effect regression modeling. It also suggests that a better discrimination of the day of disease can improve the diagnostic and prognostic classification power of clinical variables using ROC curve analysis. The piecewise mixed effect regression model corroborated key early clinical determinants of disease, and offers a time-series approach for future vaccine and pathogenesis clinical studies.
DECIDE-AI: a new reporting guideline and its relevance to artificial intelligence studies in radiology.
DECIDE-AI is a new, stage-specific reporting guideline for the early and live clinical evaluation of decision-support systems based on artificial intelligence (AI). It answers a need for more attention to the human factors influencing clinical AI performance and more transparent reporting of clinical studies investigating AI systems. Given the rapid expansion of AI systems and the concentration of related studies in radiology, these new standards are likely to find a place in radiological literature in the near future. This review highlights some of the specificities of AI as complex intervention, why a new reporting guideline was needed for early stage, live evaluation of this technology, and how DECIDE-AI and other AI reporting guidelines can be useful to radiologists and researchers.