• Department of Cardiothoracic Surgery, Children’s Hospital of Nanjing Medical University, Nanjing, 210000, P. R. China;
WU Kaihong, Email: pumcwu@sina.com
Export PDF Favorites Scan Get Citation

Objective  To evaluate the accuracy of three large language models (LLMs), ChatGPT, Grok, and DeepSeek, in predicting the natural outcome of pediatric ventricular septal defect (VSD) and their discrepancies with actual clinical outcomes, providing insights into whether LLMs can assist clinicians in providing personalized management recommendations. Methods A retrospective analysis of clinical data from pediatric patients with VSD admitted to Children's Hospital of Nanjing Medical University between October and December 2020. The VSD severity, spontaneous closure probability and surgical necessity were evaluated by ChatGPT, Grok, DeepSeek, and the expert panel, respectively. Intergroup differences were analyzed and also compared with the actual outcomes. The stability of model performance was compared based on three repeated assessments by LLMs. Results  A total of 146 children were enrolled, including 87 (59.6%) males and 59 (40.4%) females, with a median age at first diagnosis of 2.0 months (IQR: 1.1-3.4). Significant differences were observed between the Grok group and the expert panel in assessing the probability of spontaneous closure and the necessity of surgery (P=0.01, 0.02). The ChatGPT group also differed from the expert panel in evaluating the necessity of surgery (P=0.05). In comparison with the actual clinical outcomes, only the Grok group showed a significant difference (P<0.05), while ChatGPT achieved the highest consistency between predicted outcomes and actual outcomes. Intra-group analysis of three repeated assessments in the LLMs groups showed no statistically significant differences (all P>0.05). Conclusion  LLMs demonstrate potential and high stability in predicting the natural outcome of VSD. In particular, ChatGPT shows the highest consistency between its assessments and actual outcomes. LLMs can serve as an auxiliary tool to support the formulation of personalized management strategy.

Copyright ? the editorial department of Chinese Journal of Clinical Thoracic and Cardiovascular Surgery of West China Medical Publisher. All rights reserved