Search results
Found 12280 matches for
The Nuffield Department of Surgical Sciences is the academic department of surgery at the University of Oxford, and hosts a multidisciplinary team of senior clinical academic surgeons, senior scientists, junior clinicians and scientists in training.
Testing the effect of ecolabels on the environmental impact of food purchases in worksite cafeterias: a randomised controlled trial.
BACKGROUND: Reducing the environmental impact of foods consumed is important for meeting climate goals. We aimed to conduct a randomised controlled trial to test whether ecolabels reduce the environmental impact of food selected in worksite cafeterias, alone or in combination with increased availability of more sustainable meal options. METHODS: Worksite cafeterias (n = 96) were randomised to one of three study groups, with 54 included for final analysis. One group was intended to increase the availability of meat-free options, but no change was implemented. Therefore, this group was treated as part of the control, creating two groups: (1) control (no ecolabels) (n = 35), and (2) ecolabels (n = 19). Regression analysis assessed the primary outcome of total environmental impact of hot meals sold over a 6-week period. Secondary outcome analyses explored the individual environmental indicators that composed the total environmental impact score (i.e., greenhouse gas emissions, biodiversity loss, eutrophication, and water scarcity). The mean weekly environmental impact scores of hot meal options over the full 12-week trial period were assessed using hierarchical mixed effects models. RESULTS: There was no significant effect of the intervention on the environmental impact scores of meals sold (mean difference between control and intervention sites: -1.4%, 95%CI: -33.6%, + 30.8%). There was no evidence of an effect in mean weekly environmental impact score (-5.4%, 95%CI: -12.6%, + 2.5%), nor in any of the four individual environmental indicators (greenhouse gas emissions: -3.6%, 95%CI: -30.7%, 34.3%; biodiversity loss: 2.0%, 95%CI: -25.8%, 40.2%; eutrophication: -2.4%, 95%CI: -29.3%, 34.7%; water scarcity: -0.4%, 95%CI: -28.7%, 39.1%). CONCLUSIONS: Ecolabels may not be an effective tool to shift consumer behaviour in worksite cafeterias towards meals with lower environmental impact. TRIAL REGISTRATION: The study was pre-registered prospectively on ISRCTN ( https://www.isrctn.com/ISRCTN10268258 ; 06/01/2022).
A service evaluation of the implementation of a novel digital intervention for hypertension self-monitoring and management system in primary care (SHIP): protocol for a mixed methods study.
BACKGROUND: Hypertension is a key risk factor for death and disability, and blood pressure reduction is associated with significant reductions in cardiovascular risk. Large trials have shown that interventions including self-monitoring of blood pressure can reduce blood pressure but real-world data from wider implementation are lacking. AIM: The self-monitoring and management service evaluation in primary care (SHIP) study will evaluate a novel digital intervention for hypertension management and medication titration platform ("Hypertension-Plus") that is currently undergoing initial implementation into primary care in several parts of the UK. METHODS AND ANALYSES: The study will use a mixed methods approach including both quantitative analysis of anonymised electronic health record data and qualitative analyses of interview and customer support log data. Pseudonymised data will be extracted from electronic health records and outcomes compared between those using the digital intervention and their own historical data, as well as to those not registered to the system. The primary outcome will be difference in systolic blood pressure in the 12 months before and after implementation. A further analysis will utilise self-monitored blood pressure data from the Hypertension-Plus system itself. Semi-structured qualitative interviews will be completed with implementation and clinical leads, staff and patients in six general practices located in two different geographical areas in England. Informed by the non-adoption, abandonment, scale-up, spread, and sustainability (NASSS) framework, our analysis will identify the challenges to successful implementation and sustainability of the digital intervention in routine clinical practice and in patients' homes. ETHICS AND DISSEMINATION: The analyses of pseudonymised data were assessed by the sponsor (The University of Oxford) as service evaluation not requiring individual consent and hence did not require ethical approval. Ethics approval for the qualitative analyses was provided by Wales REC 4 (21/WA/0280) and individual written informed consent will be gained for all participants. Results will be published in peer-reviewed journals, presented at national and international conferences and disseminated via patient and health service organisations. DISCUSSION: This study will provide an in-depth analysis of the impact and acceptance of initial implementation of a novel digital intervention, enhancing our understanding and supporting more effective implementation of telemonitoring based hypertension management systems for blood pressure control in England.
Deep brain stimulation of the motor thalamus relieves experimentally induced air hunger.
RESEARCH QUESTION: We previously reported that deep brain stimulation (DBS) of the motor thalamus, in a patient with post-stroke tremor, relieved breathlessness associated with COPD. This raised the question of whether motor thalamus DBS mitigates the ascending dyspnoea signal. We therefore sought to conduct a fully powered cohort study of experimentally induced air hunger, an uncomfortable urge to breathe in patients with motor thalamus DBS "ON" and "OFF". METHODS: 16 patients (three females) with DBS of the ventral intermediate nucleus (VIM) as treatment for tremor underwent hypercapnic air hunger tests, with DBS ON and OFF. Patients rated air hunger on a visual analogue scale (VAS) every 15 s. Hypercapnia and ventilation were matched for ON and OFF states (end-tidal carbon dioxide tension mean±sd 43±4 and 43±4 mmHg, respectively; ventilation 13.7 and 13.4 L·min-1, respectively). Participants' ventilation was constrained to baseline levels by breathing from a 3-L inspiratory reservoir with fixed flow of fresh gas while targeting their resting breathing frequency to a metronome. RESULTS: Overall steady-state air hunger was 52±28%VAS for ON and 67±20%VAS for OFF (p=0.002; two-tailed paired t-test). The mean reduction in air hunger during VIM DBS was -14.4%VAS. DBS of the motor thalamus relieved air hunger in 13 patients, heightened air hunger in two and caused no change in one. CONCLUSION: DBS of the motor thalamus for tremor relief also mitigates the air hunger component of dyspnoea. We posit that DBS of the motor thalamus heightens the gating control of the thalamus modulating the ascending air hunger signal. Extent of relief suggests that thalamic DBS may prove to be a viable therapy for intractable dyspnoea.
Safety and efficacy of different transplant kidney biopsy techniques: comparison of two different coaxial techniques and needle types.
PURPOSE: Percutaneous ultrasound-guided renal biopsy is essential for diagnosing medical renal disorders in transplant kidneys. A variety of techniques have been advocated. The purpose of this study is to evaluate the safety and efficacy of two different coaxial techniques and biopsy devices. METHODS: This single-center dual-arm, observation study cohort included 1831 consecutive transplant kidney biopsies performed over a 68-month period. Two coaxial techniques were used, distinguished by whether the 17 gauge (G) coaxial needle was advanced into the renal cortex (intracapsular technique; IC) or to the edge of the cortex (extracapsular technique; EC). One of two needle types could be used with either technique: an 18G side-cutting (Bard Max-Core or Mission) or an 18G end-cutting (Biopince Ultra) needle. In all cases, the cortical tangential technique was used to reduce the risk of central artery transgression and unnecessary medullary sampling. Patients were monitored for 30 days post-procedurally and complications were evaluated using the SIR adverse event classification. RESULTS: Of the 1831 patients included in the study cohort, 13 suffered severe bleeding complications requiring operative intervention. Of these patients, 8 underwent biopsy with side-cutting needle and IC, 2 with side-cutting needle and approach not specified, 2 with end-cutting needle and IC, and 1 with end-cutting needle and EC. There was no statistically significant difference in the risk of bleeding complications between different coaxial techniques and needle types. However, there was a significantly increased chance of inadequate sampling when comparing the side-cutting needle (1.0%) to the end-cutting needle (0.1%). CONCLUSIONS: Transplant kidney biopsy performed with two different coaxial techniques and needle types did not show differences in bleeding complications. There is an increased risk of inadequate sampling when using side-cutting relative to end-cutting biopsy devices.
Investigations for Suspected Head and Neck Squamous Cell Carcinoma of Unknown Primary (HNSCCUP): A National Cohort Study
Objectives: Head and neck squamous cell carcinoma from unknown primary (HNSCCUP) is a rare and challenging condition. This study aimed to investigate the diagnostic pathways of suspected HNSCCUP patients in the United Kingdom. Methods: A retrospective observational cohort study was conducted, over 5 years from January 2015, in UK Head and Neck centres of consecutive adults undergoing 18F-Fluorodeoxyglucose-PET-CT (PET-CT) within 3 months of diagnosis with metastatic cervical squamous cell carcinoma. Patients with no primary site on examination and no previous head and neck cancer were eligible. Results: Data for 965 patients were received from 57 centres; 68.5% were HPV-related disease. Three investigation cycles were observed: ultrasound with biopsy, cross-sectional imaging (MRI and/or CT) and PET-CT, at median times of 17, 29.5 and 46 days from referral. No primary was identified on PET-CT in 49.8% (n = 478/960). Diagnostic tonsillectomy was performed in 58.2% (n = 278/478) and tongue base mucosectomy (TBM) in 21.7% (n = 104/479). Ipsilateral tonsillectomy carried the highest diagnostic yield (18.7%, n = 52/278), followed by TBM (15.4%, n = 16/104). Contralateral tonsillectomy, performed in 49.0% (n = 234/478), carried the lowest yield (0.9%, n = 2/234). PET-CT with concurrent MRI was associated with higher primary site detection than PET-CT with concurrent CT (p = 0.003). A minority of patients undergoing treatment with curative intent received first-definitive-treatment within 62 days of referral (15.2%, n = 77/505, median 92 days, IQR: 71–117). Conclusions: Most patients experienced a protracted diagnostic pathway and waited over 3 months for definitive treatment. Earlier PET-CT with concurrent MRI may expedite diagnosis. TBM appears more productive than contralateral tonsillectomy for primary site detection.
Development of an optimised method for the analysis of human blood plasma samples by atmospheric solids analysis probe mass spectrometry
Analysis of small-molecule metabolites in plasma has the potential for development as a clinical diagnostic and prognostic tool. Atmospheric solids analysis probe mass spectrometry (ASAP-MS) is capable of performing rapid metabolite and small molecule fingerprinting, and has the potential for use in a clinical setting. Combining ASAP-MS data with a predictive model could provide clinicians with a rapid patient risk metric, anticipating disease progression and response to treatment, and thereby aiding in treatment decisions. In order to develop predictive models, experimental errors and uncertainties must be minimised, requiring a robust experimental protocol. In the present study we have performed ASAP-MS measurements on plasma samples from patients recruited for two prospective clinical studies: the Oxford Acute Myocardial Infarction (OxAMI) study; and the Oxford Abdominal Aortic Aneurysm (OxAAA) study. Through a carefully designed series of measurements, we have optimised the method of sample introduction, together with a number of key instrument and data acquisition parameters. Following the optimisation process, we are consistently able to record high quality mass spectra for plasma samples. Typical coefficients of variation for individual mass peaks are in the range from 20%–50%, overlapping with those obtained using more sophisticated LC-MS approaches. The measurement protocol optimises mass spectral quality and reproducibility, while retaining the simplicity of measurement required for use in a clinical setting. While the protocol was developed using plasma samples from two specific patient cohorts, the method can be generalised to any plasma measurements.
Roles for the long non-coding RNA Pax6os1/PAX6-AS1 in pancreatic beta cell function.
Long non-coding RNAs (lncRNAs) are emerging as crucial regulators of beta cell function. Here, we show that an lncRNA-transcribed antisense to Pax6, annotated as Pax6os1/PAX6-AS1, was upregulated by high glucose concentrations in human as well as murine beta cell lines and islets. Elevated expression was also observed in islets from mice on a high-fat diet and patients with type 2 diabetes. Silencing Pax6os1/PAX6-AS1 in MIN6 or EndoC-βH1 cells increased several beta cell signature genes' expression. Pax6os1/PAX6-AS1 was shown to bind to EIF3D, indicating a role in translation of specific mRNAs, as well as histones H3 and H4, suggesting a role in histone modifications. Important interspecies differences were found, with a stronger phenotype in humans. Only female Pax6os1 null mice fed a high-fat diet showed slightly enhanced glucose clearance. In contrast, silencing PAX6-AS1 in human islets enhanced glucose-stimulated insulin secretion and increased calcium dynamics, whereas overexpression of the lncRNA resulted in the opposite phenotype.
Late Treatment With Autologous Expanded Regulatory T-cell Therapy After Alemtuzumab Induction Is Safe and Facilitates Immunosuppression Minimization in Living Donor Renal Transplantation.
BACKGROUND: The TWO Study (Transplantation Without Overimmunosuppression) aimed to investigate a novel approach to regulatory T-cell (Treg) therapy in renal transplant patients, using a delayed infusion protocol at 6 mo posttransplant to promote a Treg-skewed lymphocyte repopulation after alemtuzumab induction. We hypothesized that this would allow safe weaning of immunosuppression to tacrolimus alone. The COVID-19 pandemic led to the suspension of alemtuzumab use, and therefore, we report the unique cohort of 7 patients who underwent the original randomized controlled trial protocol. This study presents a unique insight into Treg therapy combined with alemtuzumab and is therefore an important proof of concept for studies in other diseases that are considering lymphodepletion. METHODS: Living donor kidney transplant recipients were randomized to receive autologous polyclonal Treg at week 26 posttransplantation, coupled with weaning doses of tacrolimus, (Treg therapy arm) or standard immunosuppression alone (tacrolimus and mycophenolate mofetil). Primary outcomes were patient survival and rejection-free survival. RESULTS: Successful cell manufacturing and cryopreservation until the 6-mo infusion were achieved. Patient and transplant survival was 100%. Acute rejection-free survival was 100% in the Treg-treated group at 18 mo after transplantation. Although alemtuzumab caused a profound depletion of all lymphocytes, including Treg, after cell therapy infusion, there was a transient increase in peripheral Treg numbers. CONCLUSIONS: The study establishes that delayed autologous Treg therapy is both feasible and safe, even 12 mo after cell production. The findings present a new treatment protocol for Treg therapy, potentially expanding its applications to other indications.
Reporting guideline for the early-stage clinical evaluation of decision support systems driven by artificial intelligence: DECIDE-AI.
A growing number of artificial intelligence (AI)-based clinical decision support systems are showing promising performance in preclinical, in silico evaluation, but few have yet demonstrated real benefit to patient care. Early-stage clinical evaluation is important to assess an AI system's actual clinical performance at small scale, ensure its safety, evaluate the human factors surrounding its use and pave the way to further large-scale trials. However, the reporting of these early studies remains inadequate. The present statement provides a multi-stakeholder, consensus-based reporting guideline for the Developmental and Exploratory Clinical Investigations of DEcision support systems driven by Artificial Intelligence (DECIDE-AI). We conducted a two-round, modified Delphi process to collect and analyze expert opinion on the reporting of early clinical evaluation of AI systems. Experts were recruited from 20 pre-defined stakeholder categories. The final composition and wording of the guideline was determined at a virtual consensus meeting. The checklist and the Explanation & Elaboration (E&E) sections were refined based on feedback from a qualitative evaluation process. In total, 123 experts participated in the first round of Delphi, 138 in the second round, 16 in the consensus meeting and 16 in the qualitative evaluation. The DECIDE-AI reporting guideline comprises 17 AI-specific reporting items (made of 28 subitems) and ten generic reporting items, with an E&E paragraph provided for each. Through consultation and consensus with a range of stakeholders, we developed a guideline comprising key items that should be reported in early-stage clinical studies of AI-based decision support systems in healthcare. By providing an actionable checklist of minimal reporting items, the DECIDE-AI guideline will facilitate the appraisal of these studies and replicability of their findings.
The IDEAL framework for surgical robotics: development, comparative evaluation and long-term monitoring.
The next generation of surgical robotics is poised to disrupt healthcare systems worldwide, requiring new frameworks for evaluation. However, evaluation during a surgical robot's development is challenging due to their complex evolving nature, potential for wider system disruption and integration with complementary technologies like artificial intelligence. Comparative clinical studies require attention to intervention context, learning curves and standardized outcomes. Long-term monitoring needs to transition toward collaborative, transparent and inclusive consortiums for real-world data collection. Here, the Idea, Development, Exploration, Assessment and Long-term monitoring (IDEAL) Robotics Colloquium proposes recommendations for evaluation during development, comparative study and clinical monitoring of surgical robots-providing practical recommendations for developers, clinicians, patients and healthcare systems. Multiple perspectives are considered, including economics, surgical training, human factors, ethics, patient perspectives and sustainability. Further work is needed on standardized metrics, health economic assessment models and global applicability of recommendations.
Holistic Human-Serving Digitization of Health Care Needs Integrated Automated System-Level Assessment Tools.
Digital health tools, platforms, and artificial intelligence- or machine learning-based clinical decision support systems are increasingly part of health delivery approaches, with an ever-greater degree of system interaction. Critical to the successful deployment of these tools is their functional integration into existing clinical routines and workflows. This depends on system interoperability and on intuitive and safe user interface design. The importance of minimizing emergent workflow stress through human factors research and purposeful design for integration cannot be overstated. Usability of tools in practice is as important as algorithm quality. Regulatory and health technology assessment frameworks recognize the importance of these factors to a certain extent, but their focus remains mainly on the individual product rather than on emergent system and workflow effects. The measurement of performance and user experience has so far been performed in ad hoc, nonstandardized ways by individual actors using their own evaluation approaches. We propose that a standard framework for system-level and holistic evaluation could be built into interacting digital systems to enable systematic and standardized system-wide, multiproduct, postmarket surveillance and technology assessment. Such a system could be made available to developers through regulatory or assessment bodies as an application programming interface and could be a requirement for digital tool certification, just as interoperability is. This would enable health systems and tool developers to collect system-level data directly from real device use cases, enabling the controlled and safe delivery of systematic quality assessment or improvement studies suitable for the complexity and interconnectedness of clinical workflows using developing digital health technologies.
Association of Clinician Diagnostic Performance With Machine Learning-Based Decision Support Systems: A Systematic Review.
IMPORTANCE: An increasing number of machine learning (ML)-based clinical decision support systems (CDSSs) are described in the medical literature, but this research focuses almost entirely on comparing CDSS directly with clinicians (human vs computer). Little is known about the outcomes of these systems when used as adjuncts to human decision-making (human vs human with computer). OBJECTIVES: To conduct a systematic review to investigate the association between the interactive use of ML-based diagnostic CDSSs and clinician performance and to examine the extent of the CDSSs' human factors evaluation. EVIDENCE REVIEW: A search of MEDLINE, Embase, PsycINFO, and grey literature was conducted for the period between January 1, 2010, and May 31, 2019. Peer-reviewed studies published in English comparing human clinician performance with and without interactive use of an ML-based diagnostic CDSSs were included. All metrics used to assess human performance were considered as outcomes. The risk of bias was assessed using Quality Assessment of Diagnostic Accuracy Studies (QUADAS-2) and Risk of Bias in Non-Randomised Studies-Intervention (ROBINS-I). Narrative summaries were produced for the main outcomes. Given the heterogeneity of medical conditions, outcomes of interest, and evaluation metrics, no meta-analysis was performed. FINDINGS: A total of 8112 studies were initially retrieved and 5154 abstracts were screened; of these, 37 studies met the inclusion criteria. The median number of participating clinicians was 4 (interquartile range, 3-8). Of the 107 results that reported statistical significance, 54 (50%) were increased by the use of CDSSs, 4 (4%) were decreased, and 49 (46%) showed no change or an unclear change. In the subgroup of studies carried out in representative clinical settings, no association between the use of ML-based diagnostic CDSSs and improved clinician performance could be observed. Interobserver agreement was the commonly reported outcome whose change was the most strongly associated with CDSS use. Four studies (11%) reported on user feedback, and, in all but 1 case, clinicians decided to override at least some of the algorithms' recommendations. Twenty-eight studies (76%) were rated as having a high risk of bias in at least 1 of the 4 QUADAS-2 core domains, and 6 studies (16%) were considered to be at serious or critical risk of bias using ROBINS-I. CONCLUSIONS AND RELEVANCE: This systematic review found only sparse evidence that the use of ML-based CDSSs is associated with improved clinician diagnostic performance. Most studies had a low number of participants, were at high or unclear risk of bias, and showed little or no consideration for human factors. Caution should be exercised when estimating the current potential of ML to improve human diagnostic performance, and more comprehensive evaluation should be conducted before deploying ML-based CDSSs in clinical settings. The results highlight the importance of considering supported human decisions as end points rather than merely the stand-alone CDSSs outputs.
The IDEAL Reporting Guidelines: A Delphi Consensus Statement Stage Specific Recommendations for Reporting the Evaluation of Surgical Innovation.
OBJECTIVE: The aim of this study was to define reporting standards for IDEAL format studies. BACKGROUND: The IDEAL Framework and Recommendations establish an integrated pathway for evaluation of new surgical techniques and complex therapeutic technologies. However guidance on implementation has been incomplete, and incorrect use is commonly seen. We describe the consensus development of reporting guidelines for the IDEAL stages, and plans for their dissemination and evaluation. METHODS: Using the EQUATOR Network recommendations, participants with knowledge of IDEAL were surveyed to determine which IDEAL stages needed reporting guidelines. Draft checklists for stages 1, 2a, 2b, and 4 were subsequently developed by 3 researchers (N.B., A.H., P.M.), and revised through a 2-round Delphi consensus process. A final consensus teleconference resolved outstanding disagreements and clarified wording for checklist items. RESULTS: Sixty-one participants completed the initial survey, a clear majority indicating that new reporting guidelines were needed for IDEAL Stage 1 (69.5%), Stage 2a (78%), Stage 2b (74.6%), and Stage 4 (66%). A proposed set of checklists was modified by survey participants in 2 online Delphi rounds (n = 54 and n = 47, respectively), resulting in a penultimate checklist for each stage. Fourteen expert working group members finalized the checklist items and successfully resolved any outstanding areas without agreement on a consensus call. CONCLUSIONS: Participants familiar with IDEAL called for reporting guidelines for studies in all IDEAL stages except stage 3. The checklists developed have the potential to improve standards of reporting and thereby advance the quality of research on surgery and complex interventions and technologies, but require further evaluation in use.
Early development of decision support systems based on artificial intelligence: an application to postoperative complications and a cross-specialty reporting guideline for early-stage clinical evaluation
Background: Complications after major surgery occur in a similar manner internationally but the success of response process in preventing death varies widely depending on speed and appropriateness. Artificial intelligence (AI) offers new opportunities to provide support to the decision making of clinicians in this stressful situation when uncertainty is high. However, few AI systems have been robustly and successfully tested in real-world clinical settings. Whilst preparing to develop an AI decision support algorithm and planning to evaluate it in real-world settings, a lack of appropriate guidance on reporting early clinical evaluation of such systems was identified. Objectives: The objectives of this work were twofold: i) to develop a prototype of AI system to improve the management of postoperative complications; and ii) to understand expert consensus on reporting standards for early-stage evaluation of AI systems in live clinical settings. Methods: I conducted and thematically analysed interviews with clinicians to identify their main challenges and support needs when managing postoperative complications. I then systematically reviewed the literature on the impact of AI-based decision support systems on clinicians’ diagnostic performance. A model based on unsupervised clustering and providing prescription recommendations was developed, optimised, and tested on an internal hold out dataset. Finally, I conducted a Delphi process, to reach expert consensus on minimum reporting standards for the early-stage clinical evaluation of AI systems in live clinical settings. Results: 12 interviews were conducted with junior and senior clinicians identifying 54 themes about challenges, common errors, strategies, and support needs when managing postoperative complications. 37 studies were included in the systematic review, which found no robust evidence of a positive association between the use of AI decision support systems and improved clinician diagnostic performance. The developed algorithm showed no improvement in recall at position ten compared to a list of the most common prescriptions in the study population. When considering the prevalence of the individual prescriptions, the algorithm showed a 12% relative increase in performance compared to the same baseline. 151 experts participated in the Delphi study, representing 18 countries and 20 stakeholder groups. The final DECIDE-AI checklist comprises 27 items, accompanied by Explanation & Elaboration sections for each. Conclusion: The proposed algorithm offers a proof of concept for an AI system to improve the management of postoperative complications. However, it needs further development and evaluation before claiming clinical utility. The DECIDE-AI guideline provides a practicable checklist for researchers reporting on the implementation of AI decision support systems in clinical settings, and merits future iterative evaluation-update cycles in practice.
Artificial intelligence in medical device software and high-risk medical devices - a review of definitions, expert recommendations and regulatory initiatives.
INTRODUCTION: Artificial intelligence (AI) encompasses a wide range of algorithms with risks when used to support decisions about diagnosis or treatment, so professional and regulatory bodies are recommending how they should be managed. AREAS COVERED: AI systems may qualify as standalone medical device software (MDSW) or be embedded within a medical device. Within the European Union (EU) AI software must undergo a conformity assessment procedure to be approved as a medical device. The draft EU Regulation on AI proposes rules that will apply across industry sectors, while for devices the Medical Device Regulation also applies. In the CORE-MD project (Coordinating Research and Evidence for Medical Devices), we have surveyed definitions and summarize initiatives made by professional consensus groups, regulators, and standardization bodies. EXPERT OPINION: The level of clinical evidence required should be determined according to each application and to legal and methodological factors that contribute to risk, including accountability, transparency, and interpretability. EU guidance for MDSW based on international recommendations does not yet describe the clinical evidence needed for medical AI software. Regulators, notified bodies, manufacturers, clinicians and patients would all benefit from common standards for the clinical evaluation of high-risk AI applications and transparency of their evidence and performance.
Examining the empirical evidence for IDEAL 2b studies: the effects of preceding prospective collaborative cohort studies on the quality and impact of subsequent randomized controlled trials of surgical innovations - protocol for a systematic review and case-control analysis.
Randomized controlled trials (RCTs) in surgery face methodological challenges, which often result in low quality or failed trials. The Idea, Development, Exploration, Assessment and Long-term (IDEAL) framework proposes preliminary prospective collaborative cohort studies with specific properties (IDEAL 2b studies) to increase the quality and feasibility of surgical RCTs. Little empirical evidence exists for this proposition, and specifically designed 2b studies are currently uncommon. Prospective collaborative cohort studies are, however, relatively common, and might provide similar benefits. We will, therefore, assess the association between prior 'IDEAL 2b-like' cohort studies and the quality and impact of surgical RCTs. We propose a systematic review using two parallel case-control analyses, with surgical RCTs as subjects and study quality and journal impact factor (IF) as the outcomes of interest. We will search for surgical RCTs published between 2015 and 2019 and and prior prospective collaborative cohort studies authored by any of the RCT investigators. RCTs will be categorized into cases or controls by (1) journal (IF ≥or <5) and (2) study quality (PEDro score ≥or < 7). The case/control OR of exposure to a prior '2b like' study will be calculated independently for quality and impact. Cases will be matched 1: 1 with controls by year of publication, and confounding by peer-reviewed funding, author academic affiliation and trial protocol registration will be examined using multiple logistic regression analysis. This study will examine whether preparatory IDEAL 2b-like studies are associated with higher quality and impact of subsequent RCTs.