• 1. Department of Thoracic Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310003, P. R. China;
  • 2. College of Biomedical Engineering & Instrument Science, Zhejiang University, Hangzhou, 310027, P. R. China;
  • 3. Department of Information Technology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310003, P. R. China;
  • 4. Key Laboratory of Clinical Evaluation Technology for Medical Device of Zhejiang Province, Hangzhou, 310003, P. R. China;
LV Xudong, Email: dr_hujian@zju.edu.cn; HU Jian, Email: lvxd@zju.edu.cn
Export PDF Favorites Scan Get Citation

Objective To construct a lung cancer surgery-oriented disease-specific database covering the entire perioperative care pathway, thereby improving the quality and usability of key surgical data elements. Methods  Real-world clinical data were extracted from a single-center thoracic surgery department. A standardized data model was established based on the open electronic health record (openEHR) standard. Large language model (LLM), optical character recognition (OCR), and artificial intelligence (AI)-driven techniques were employed to extract, structure, and perform quality control on unstructured clinical narratives, imaging reports, and radiological data, with a focus on capturing surgically relevant perioperative indicator. Results  A multimodal database comprising 19 917 patients was established, including 7 930 males and 11 987 females, with ages ranging from 15 to 97 (61.7±9.7) years. The database includes 582 structured data variables, textual report data corresponding to 69 clinical indicators, 13 000 pulmonary function test PDF reports, and chest CT imaging data from 16 884 patients. This database comprehensively covers major information relevant to surgical diagnosis and treatment of lung cancer, significantly improving the completeness and granularity of surgical detail data. Large language models (LLMs) and optical character recognition (OCR) technologies enhanced the efficiency of converting unstructured data into structured formats, while a multi-level manual verification process ensured data accuracy and traceability. The database supports real-world research including comparisons of surgical procedures, prediction of postoperative complications, prognosis assessment, and multimodal data association analyses.

Citation: HUANG Xuhua, NIE Yunfeng, SHEN Liang, KONG Pengxu, TAN Xin, LI Zihao, LV Wang, ZHOU Min, LV Xudong, HU Jian. Construction and clinical application exploration of an artificial intelligence-based high-quality lung cancer surgery dataset. Chinese Journal of Clinical Thoracic and Cardiovascular Surgery, 2026, 33(5): 717-727. doi: 10.7507/1007-4848.202512108 Copy

Copyright ? the editorial department of Chinese Journal of Clinical Thoracic and Cardiovascular Surgery of West China Medical Publisher. All rights reserved

  • Previous Article

    Research advances in the role of cell cycle dysregulation in the development and progression of pancreatic ductal adenocarcinoma
  • Next Article

    Research progress on clinical application of oscillating gradient spin echo sequence in breast cancer