• 1. School of Communication Engineering, Chengdu University of Information Technology, Chengdu, Sichuan 610225, P. R. China;
  • 2. Institute of Hospital Management, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, P. R. China;
  • 3. Institute of Respiratory Health and Multimorbidity, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, P. R. China;
  • 4. Health Management Center, General Practice Medical Center, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, P. R. China;
GU Tao, Email: gutao@swufe.edu.cn
Export PDF Favorites Scan Get Citation

Objective  To develop a computer-aided diagnosis model for lung cancer based on routine health examination data for identifying individuals with a current high risk of lung cancer in health screening settings, thereby providing decision support for subsequent clinical confirmation. Methods  Individuals who underwent health examinations at the Health Management Center of West China Hospital, Sichuan University, between 2010 and 2022 were enrolled. After screening, a retrospective cohort of 5257 subjects was retained, comprising 1307 patients with lung cancer and 3950 non-lung cancer controls. A three-tier feature fusion model was designed: Heterogeneous feature encoding module: a multi-layer perceptron and bidirectional encoder representations from transformers (BERT) were employed to extract feature vectors from structured data and unstructured data (medical records and imaging report texts), respectively. Heterogeneous feature fusion architecture: dimensional expansion concatenation coupled with a gated recurrent unit based gating network was implemented to achieve multi-scale feature alignment and deep interaction, thereby addressing dimensional discrepancies and information redundancy. Attention-based decision mechanism: word-level attention with weighted pooling was applied to dynamically capture key features and generate risk probability distributions. Model performance was evaluated using precision, recall, F1-score, and the area under the receiver operating characteristic curve (AUC-ROC). Results  The proposed model significantly outperformed both single-data-type models and simple concatenation approaches. On the test set, the proposed model achieved a recall of 0.861, an F1-score of 0.882, and an AUC-ROC of 0.972, substantially surpassing the best-performing model trained on structured data alone (extreme gradient boosting: recall=0.630, F1-score=0.725, AUC-ROC=0.916) and the model trained on unstructured data alone (BERT coupled with a bidirectional long short-term memory network: recall=0.833, F1-score=0.846, AUC-ROC=0.944). Feature elimination experiments demonstrated minimal performance variation across different feature subsets, confirming the model’s capability to effectively identify and mitigate the impact of irrelevant features. Subgroup analyses revealed that the model performed optimally in female subjects (recall=0.835, F1-score=0.838, AUC-ROC=0.950) and individuals aged >69 years (recall=0.913, F1-score=0.875, AUC-ROC=0.911). Conclusion  The proposed model based on heterogeneous health examination data can identify high-risk individuals for lung cancer among health examination populations using only routine screening data, thereby facilitating the early diagnosis of lung cancer in this population.

Citation: LI Yulin, SONG Lijun, TANG Xiumei, ZHONG Jiandan, JI Guiyi, LI Weimin, GU Tao. A feature fusion framework for lung cancer computer-aided diagnosis model: development and application based on heterogeneous data from health examination populations. West China Medical Journal, 2026, 41(4): 554-561. doi: 10.7507/1002-0179.202509119 Copy

Copyright ? the editorial department of West China Medical Journal of West China Medical Publisher. All rights reserved

  • Previous Article

    Correlation analysis of blood lipid-related metabolic indicators with interstitial lung abnormality
  • Next Article

    Mutation profile analysis of circulating tumor DNA in advanced lung cancer using targeted sequencing