The application value of large language models in predicting the natural outcome of ventricular septal defect_Chinese Journal of Clinical Thoracic and Cardiovascular Surgery

Authors：

MA Yujun , ZHAO Yudie , CAO Wenxin ,  WU Kaihong

Department of Cardiothoracic Surgery, Children’s Hospital of Nanjing Medical University, Nanjing, 210000, P. R. China;

Corresponding?author：

WU Kaihong, Email: pumcwu@sina.com

Keywords：

Ventricular septal defect; large language models; ChatGPT; Grok; DeepSeek; expert assessment; natural outcome; clinical decision-making

DOI：

10.7507/1007-4848.202512090

Video：

Export PDF Favorites Scan Get Citation

Abstract Full text Figures/Tables Video References Cited by

Objective To evaluate the accuracy of three large language models (LLMs), ChatGPT, Grok, and DeepSeek, in predicting the natural outcome of pediatric ventricular septal defect (VSD) and their discrepancies with actual clinical outcomes, providing insights into whether LLMs can assist clinicians in providing personalized management recommendations. Methods A retrospective analysis of clinical data from pediatric patients with VSD admitted to Children's Hospital of Nanjing Medical University between October and December 2020. The VSD severity, spontaneous closure probability and surgical necessity were evaluated by ChatGPT, Grok, DeepSeek, and the expert panel, respectively. Intergroup differences were analyzed and also compared with the actual outcomes. The stability of model performance was compared based on three repeated assessments by LLMs. Results A total of 146 children were enrolled, including 87 (59.6%) males and 59 (40.4%) females, with a median age at first diagnosis of 2.0 months (IQR: 1.1-3.4). Significant differences were observed between the Grok group and the expert panel in assessing the probability of spontaneous closure and the necessity of surgery (P=0.01, 0.02). The ChatGPT group also differed from the expert panel in evaluating the necessity of surgery (P=0.05). In comparison with the actual clinical outcomes, only the Grok group showed a significant difference (P<0.05), while ChatGPT achieved the highest consistency between predicted outcomes and actual outcomes. Intra-group analysis of three repeated assessments in the LLMs groups showed no statistically significant differences (all P>0.05). Conclusion LLMs demonstrate potential and high stability in predicting the natural outcome of VSD. In particular, ChatGPT shows the highest consistency between its assessments and actual outcomes. LLMs can serve as an auxiliary tool to support the formulation of personalized management strategy.

1.	Ma K, He Q, Dou Z, et al. Current treatment outcomes of congenital heart disease and future perspectives. Lancet Child Adolesc Health, 2023, 7(7): 490-501.
2.	Zhang Y, Wang J, Zhao J, et al. Current status and challenges in prenatal and neonatal screening, diagnosis, and management of congenital heart disease in China. Lancet Child Adolesc Health, 2023, 7(7): 479-489.
3.	Pihl C, Sillesen AS, Norsk JB, et al. The prevalence and spontaneous closure of ventricular septal defects the first year of life. Neonatology, 2024, 121(6): 742-751.
4.	Kim AY, Tchah N, Lin CY, et al. Predictive scoring system for spontaneous closure of infant ventricular septal defect: the P-VSD score. Pediatr Cardiol, 2025, 46(2): 401-408.
5.	Karolcik BA, Rao SO, Lucas JF, et al. Timing of surgical repair and resource utilisation in infants with complete atrioventricular septal defect. Cardiol Young, 2023, 33(5): 766-770.
6.	劉明, 吳忠明, 廖劍, 等. 大語言模型的教育應用: 原理、現狀與挑戰—從輕量級BERT到對話式ChatGPT. 現代教育技術, 2023, 33(8): 19-28.Liu M, Wu ZM, Liao J, et al. Educational applications of large language models: principles, status and challenges: from light-weighted BERT to conversational ChatGPT. Mod Educ Technol, 2023, 33(8): 19-28.
7.	Laymouna M, Ma Y, Lessard D, et al. Roles, users, benefits, and limitations of chatbots in health care: rapid review. J Med Internet Res, 2024, 26: e56930.
8.	Soniwala A, Kim S, Welkley J, et al. A large language model analysis of global inequities in precision medicine research on diabetes. Ann Epidemiol, 2025, 109: 25-30.
9.	Lahat A, Shachar E, Avidan B, et al. Evaluating the use of large language model in identifying top research questions in gastroenterology. Sci Rep, 2023, 13(1): 4164.
10.	Lievin V, Hother CE, Motzfeldt AG, et al. Can large language models reason about medical questions? Patterns (N Y), 2024, 5(3): 100943.
11.	Ahn C. Exploring ChatGPT for information of cardiopulmonary resuscitation. Resuscitation, 2023, 185: 109729.
12.	Nayak S, Patel A, Haddad L, et al. Echocardiographic evaluation of ventricular septal defects. Echocardiography, 2020, 37(12): 2185-2193.
13.	Dittrich S, Ewert P, Lê TP, et al. Guidelines for the management of congenital heart diseases in childhood and adolescence. Cardiol Young, 2017, 27(S3): S1-S105.
14.	Silversides CK, Marelli A, Beauchesne L, et al. Canadian cardiovascular society 2009 consensus conference on the management of adults with congenital heart disease: executive summary. Can J Cardiol, 2010, 26(3): 143-150.
15.	Working Group on Management of Congenital Heart Diseases in India. Consensus on timing of intervention for common congenital heart disease. Indian Pediatr, 2008, 45(2): 117-126.
16.	Chen X, Zhang Q, Lu M, et al. Prenatal finding of isolated ventricular septal defect: genetic association, outcomes and counseling. Front Genet, 2024, 15: 1447216.
17.	Sun J, Feng T, Wang B, et al. Leveraging artificial intelligence for predicting spontaneous closure of perimembranous ventricular septal defect in children: a multicentre, retrospective study in China. Lancet Digit Health, 2025, 7(1): e44-e53.
18.	Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health, 2023, 2(2): e0000198.
19.	Shay D, Kumar B, Bellamy D, et al. Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions. Br J Anaesth, 2023, 131(2): e31-e34.
20.	Shi R, Liu S, Xu X, et al. Benchmarking four large language models' performance of addressing Chinese patients' inquiries about dry eye disease: a two-phase study. Heliyon, 2024, 10(14): e34391.
21.	Zhang S, Wang W, Pi X, et al. Advances in the application of traditional Chinese medicine using artificial intelligence: a review. Am J Chin Med, 2023, 51(5): 1067-1083.
22.	Song Z, Chen G, Chen CY. AI empowering traditional Chinese medicine? Chem Sci, 2024, 15(41): 16844-16886.
23.	孫磊, 汪安安, 宋一敏, 等. 大語言模型在臨床醫學領域的應用、挑戰和展望. 解放軍醫學院學報, 2025, 46(1): 50-60.Sun L, Wang AA, Song YM, et al. Applications, challenges, and prospects of large language models in clinical medicine. Acad J Chin PLA Med Sch, 2025, 46(1): 50-60.
24.	Zhao W, Lai H, Pan B, et al. Assessing the adherence of large language models to clinical practice guidelines in Chinese medicine: a content analysis. Front Pharmacol, 2025, 16: 1649041.
25.	Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology, 2023, 307(5): e230582.
26.	Cheng T, Li Y, Gu J, et al. The performance of ChatGPT in day surgery and pre-anesthesia risk assessment: a case-control study of 150 simulated patient presentations. Perioper Med (Lond), 2024, 13(1): 111.
27.	Xue VW, Lei P, Cho WC. The potential impact of ChatGPT in clinical and translational medicine. Clin Transl Med, 2023, 13(3): e1216.
28.	Tangsrivimol JA, Darzidehkalani E, Virk HUH, et al. Benefits, limits, and risks of ChatGPT in medicine. Front Artif Intell, 2025, 8: 1518049.
29.	Dayawansa S, Mantziaris G, Sheehan J. ChatGPT versus human touch in stereotactic radiosurgery. J Neurooncol, 2023, 163(2): 481-483.
30.	Huo B, Boyle A, Marfo N, et al. Large language models for chatbot health advice studies: a systematic review. JAMA Netw Open, 2025, 8(2): e2457879.
31.	Yang S, Qin G, He G, et al. Evaluation of first-trimester ultrasound screening strategy for fetal congenital heart disease. Ultrasound Obstet Gynecol, 2025, 65(4): 478-486.
32.	Tyris J, Putnick DL, Parikh K, et al. Place-based opportunity and well child visit attendance in early childhood. Acad Pediatr, 2024, 24(8): 1220-1228.
33.	Wang L, Chen X, Deng X, et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digit Med, 2024, 7(1): 41.

1. Ma K, He Q, Dou Z, et al. Current treatment outcomes of congenital heart disease and future perspectives. Lancet Child Adolesc Health, 2023, 7(7): 490-501.
2. Zhang Y, Wang J, Zhao J, et al. Current status and challenges in prenatal and neonatal screening, diagnosis, and management of congenital heart disease in China. Lancet Child Adolesc Health, 2023, 7(7): 479-489.
3. Pihl C, Sillesen AS, Norsk JB, et al. The prevalence and spontaneous closure of ventricular septal defects the first year of life. Neonatology, 2024, 121(6): 742-751.
4. Kim AY, Tchah N, Lin CY, et al. Predictive scoring system for spontaneous closure of infant ventricular septal defect: the P-VSD score. Pediatr Cardiol, 2025, 46(2): 401-408.
5. Karolcik BA, Rao SO, Lucas JF, et al. Timing of surgical repair and resource utilisation in infants with complete atrioventricular septal defect. Cardiol Young, 2023, 33(5): 766-770.
6. 劉明, 吳忠明, 廖劍, 等. 大語言模型的教育應用: 原理、現狀與挑戰—從輕量級BERT到對話式ChatGPT. 現代教育技術, 2023, 33(8): 19-28.Liu M, Wu ZM, Liao J, et al. Educational applications of large language models: principles, status and challenges: from light-weighted BERT to conversational ChatGPT. Mod Educ Technol, 2023, 33(8): 19-28.
7. Laymouna M, Ma Y, Lessard D, et al. Roles, users, benefits, and limitations of chatbots in health care: rapid review. J Med Internet Res, 2024, 26: e56930.
8. Soniwala A, Kim S, Welkley J, et al. A large language model analysis of global inequities in precision medicine research on diabetes. Ann Epidemiol, 2025, 109: 25-30.
9. Lahat A, Shachar E, Avidan B, et al. Evaluating the use of large language model in identifying top research questions in gastroenterology. Sci Rep, 2023, 13(1): 4164.
10. Lievin V, Hother CE, Motzfeldt AG, et al. Can large language models reason about medical questions? Patterns (N Y), 2024, 5(3): 100943.
11. Ahn C. Exploring ChatGPT for information of cardiopulmonary resuscitation. Resuscitation, 2023, 185: 109729.
12. Nayak S, Patel A, Haddad L, et al. Echocardiographic evaluation of ventricular septal defects. Echocardiography, 2020, 37(12): 2185-2193.
13. Dittrich S, Ewert P, Lê TP, et al. Guidelines for the management of congenital heart diseases in childhood and adolescence. Cardiol Young, 2017, 27(S3): S1-S105.
14. Silversides CK, Marelli A, Beauchesne L, et al. Canadian cardiovascular society 2009 consensus conference on the management of adults with congenital heart disease: executive summary. Can J Cardiol, 2010, 26(3): 143-150.
15. Working Group on Management of Congenital Heart Diseases in India. Consensus on timing of intervention for common congenital heart disease. Indian Pediatr, 2008, 45(2): 117-126.
16. Chen X, Zhang Q, Lu M, et al. Prenatal finding of isolated ventricular septal defect: genetic association, outcomes and counseling. Front Genet, 2024, 15: 1447216.
17. Sun J, Feng T, Wang B, et al. Leveraging artificial intelligence for predicting spontaneous closure of perimembranous ventricular septal defect in children: a multicentre, retrospective study in China. Lancet Digit Health, 2025, 7(1): e44-e53.
18. Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: potential for AI-assisted medical education using large language models. PLoS Digit Health, 2023, 2(2): e0000198.
19. Shay D, Kumar B, Bellamy D, et al. Assessment of ChatGPT success with specialty medical knowledge using anaesthesiology board examination practice questions. Br J Anaesth, 2023, 131(2): e31-e34.
20. Shi R, Liu S, Xu X, et al. Benchmarking four large language models' performance of addressing Chinese patients' inquiries about dry eye disease: a two-phase study. Heliyon, 2024, 10(14): e34391.
21. Zhang S, Wang W, Pi X, et al. Advances in the application of traditional Chinese medicine using artificial intelligence: a review. Am J Chin Med, 2023, 51(5): 1067-1083.
22. Song Z, Chen G, Chen CY. AI empowering traditional Chinese medicine? Chem Sci, 2024, 15(41): 16844-16886.
23. 孫磊, 汪安安, 宋一敏, 等. 大語言模型在臨床醫學領域的應用、挑戰和展望. 解放軍醫學院學報, 2025, 46(1): 50-60.Sun L, Wang AA, Song YM, et al. Applications, challenges, and prospects of large language models in clinical medicine. Acad J Chin PLA Med Sch, 2025, 46(1): 50-60.
24. Zhao W, Lai H, Pan B, et al. Assessing the adherence of large language models to clinical practice guidelines in Chinese medicine: a content analysis. Front Pharmacol, 2025, 16: 1649041.
25. Bhayana R, Krishna S, Bleakney RR. Performance of ChatGPT on a radiology board-style examination: insights into current strengths and limitations. Radiology, 2023, 307(5): e230582.
26. Cheng T, Li Y, Gu J, et al. The performance of ChatGPT in day surgery and pre-anesthesia risk assessment: a case-control study of 150 simulated patient presentations. Perioper Med (Lond), 2024, 13(1): 111.
27. Xue VW, Lei P, Cho WC. The potential impact of ChatGPT in clinical and translational medicine. Clin Transl Med, 2023, 13(3): e1216.
28. Tangsrivimol JA, Darzidehkalani E, Virk HUH, et al. Benefits, limits, and risks of ChatGPT in medicine. Front Artif Intell, 2025, 8: 1518049.
29. Dayawansa S, Mantziaris G, Sheehan J. ChatGPT versus human touch in stereotactic radiosurgery. J Neurooncol, 2023, 163(2): 481-483.
30. Huo B, Boyle A, Marfo N, et al. Large language models for chatbot health advice studies: a systematic review. JAMA Netw Open, 2025, 8(2): e2457879.
31. Yang S, Qin G, He G, et al. Evaluation of first-trimester ultrasound screening strategy for fetal congenital heart disease. Ultrasound Obstet Gynecol, 2025, 65(4): 478-486.
32. Tyris J, Putnick DL, Parikh K, et al. Place-based opportunity and well child visit attendance in early childhood. Acad Pediatr, 2024, 24(8): 1220-1228.
33. Wang L, Chen X, Deng X, et al. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs. NPJ Digit Med, 2024, 7(1): 41.

Chinese Journal of Clinical Thoracic and Cardiovascular Surgery

Latest ArticlesThe application value of large language models in predicting the natural outcome of ventricular septal defect

Abstract Full text Figures/Tables Video References Cited by

Format

Content