• 1. Department of Thoracic Surgery, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310003, P. R. China;
  • 2. College of Biomedical Engineering & Instrument Science, Zhejiang University, Hangzhou, 310027, P. R. China;
  • 3. Department of Information Technology, The First Affiliated Hospital, Zhejiang University School of Medicine, Hangzhou, 310003, P. R. China;
  • 4. Key Laboratory of Clinical Evaluation Technology for Medical Device of Zhejiang Province, Hangzhou, 310003, P. R. China;
LV Xudong, Email: dr_hujian@zju.edu.cn; HU Jian, Email: lvxd@zju.edu.cn
Export PDF Favorites Scan Get Citation

Objective To construct a lung cancer surgery-oriented disease-specific database covering the entire perioperative care pathway, thereby improving the quality and usability of key surgical data elements. Methods  Real-world clinical data were extracted from a single-center thoracic surgery department. A standardized data model was established based on the open electronic health record (openEHR) standard. Large language model (LLM), optical character recognition (OCR), and artificial intelligence (AI)-driven techniques were employed to extract, structure, and perform quality control on unstructured clinical narratives, imaging reports, and radiological data, with a focus on capturing surgically relevant perioperative indicator. Results  A multimodal database comprising 19 917 patients was established, including 7 930 males and 11 987 females, with ages ranging from 15 to 97 (61.7±9.7) years. The database includes 582 structured data variables, textual report data corresponding to 69 clinical indicators, 13 000 pulmonary function test PDF reports, and chest CT imaging data from 16 884 patients. This database comprehensively covers major information relevant to surgical diagnosis and treatment of lung cancer, significantly improving the completeness and granularity of surgical detail data. Large language models (LLMs) and optical character recognition (OCR) technologies enhanced the efficiency of converting unstructured data into structured formats, while a multi-level manual verification process ensured data accuracy and traceability. The database supports real-world research including comparisons of surgical procedures, prediction of postoperative complications, prognosis assessment, and multimodal data association analyses.

Copyright ? the editorial department of Chinese Journal of Clinical Thoracic and Cardiovascular Surgery of West China Medical Publisher. All rights reserved