Development of a machine learning-based preoperative prediction model for spread through air spaces in early-stage lung adenocarcinoma_Chinese Journal of Clinical Thoracic and Cardiovascular Surgery

Authors：

CHU Kai ^1,2 , XU Xinrong ² , LIU Zhenyu ^1,2 , REN Qinglin ² , HE Wenbo ² , HU Minlu ² , WANG Xiaolin ² ,  SHU Yusheng ^1,2

1. The Affiliated Yangzhou Clinical College of Xuzhou Medical University, Yangzhou, 225000, Jiangsu, P. R. China;
2. Department of Thoracic Surgery, Northern Jiangsu People’s Hospital, Yangzhou, 225000, Jiangsu, P. R. China;

Corresponding?author：

SHU Yusheng, Email: shuyusheng65@163.com

Keywords：

Machine learning; spread through air spaces; early-stage lung adenocarcinoma; SHAP analysis; preoperative prediction; radiological features; clinical decision-making; risk stratification

DOI：

10.7507/1007-4848.202511037

Video：

Export PDF Favorites Scan Get Citation

Abstract Full text Figures/Tables Video References Cited by

Objective To develop and validate a machine learning model based on preoperative clinical characteristics, laboratory indices, and radiological features for the non-invasive prediction of spread through air spaces (STAS) in patients with early-stage lung adenocarcinoma. Methods Preoperative data from patients with early-stage lung adenocarcinoma who underwent surgical resection at Northern Jiangsu People's Hospital between January 2020 and August 2025 were retrospectively collected. The data included clinical characteristics, laboratory indices, and radiological features. Patients were divided into a STAS-positive and a STAS-negative group based on postoperative pathological findings. The dataset was randomly split into a training set and a testing set at a 7 : 3 ratio. Feature variables were selected using the maximum relevance and minimum redundancy (mRMR) algorithm and the least absolute shrinkage and selection operator (LASSO) regression. Five machine learning models were constructed: logistic regression (LR), random forest (RF), support vector machine (SVM), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost). Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and decision curve analysis (DCA). The shapley additive explanations (SHAP) method was employed to interpret the optimal prediction model. Results A total of 377 patients were included, comprising 177 (46.9%) males and 200 females (53.1%), with a mean age of (63.31±9.73) years. There were 261 patients in the training set and 116 patients in the testing set. In the training set, statistically significant differences were observed between the STAS-positive group (n=130) and STAS-negative group (n=131) across multiple features, including age, sex, neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), clinical T stage, and maximum solid component diameter (P<0.05). A final set of 10 feature variables was selected by combining mRMR and LASSO regression, and five machine learning models (LR, RF, SVM, LightGBM, XGBoost) were developed. The XGBoost model demonstrated superior predictive performance in both the training and testing sets, achieving AUCs of 0.947 [95%CI (0.920, 0.975)] and 0.943 [95%CI (0.894, 0.993)], respectively, and achieved the optimal level in the testing set. DCA indicated that the XGBoost model provided a high net clinical benefit across a wide range of threshold probabilities. SHAP analysis revealed that the vessel convergence sign, clinical T stage, age, consolidation-to-tumor ratio (CTR), and MLR were the features with the highest contributions to STAS prediction. Conclusion The XGBoost model effectively predicts preoperative STAS status in early-stage lung adenocarcinoma, exhibiting excellent discriminative performance and good clinical interpretability. Key predictors such as the vessel convergence sign, clinical T stage, age and CTR provide a crucial reference for preoperative risk assessment and the individualized selection of surgical strategies, ultimately benefiting patients.

1.	Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 2024, 74(3): 229-263.
2.	Zhang Y, Vaccarella S, Morgan E, et al. Global variations in lung cancer incidence by histological subtype in 2020: a population-based study. Lancet Oncol, 2023, 24(11): 1206-1218.
3.	Travis WD, Brambilla E, Noguchi M, et al. International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol, 2011, 6(2): 244-285.
4.	Liu J, Ji Y, Weng X, et al. Immune microenvironment analysis and novel biomarkers of early-stage lung adenocarcinoma evolution. Front Oncol, 2023, 13: 1150098.
5.	Jeong WG, Choi H, Chae KJ, et al. Prognosis and recurrence patterns in patients with early stage lung cancer: a multi-state model approach. Transl Lung Cancer Res, 2022, 11(7): 1279-1291.
6.	Yun JK, Lee GD, Choi S, et al. Various recurrence dynamics for non-small cell lung cancer depending on pathological stage and histology after surgical resection. Transl Lung Cancer Res, 2022, 11(7): 1327-1336.
7.	Jonas DE, Reuland DS, Reddy SM, et al. Screening for lung cancer with low-dose computed tomography: updated evidence report and systematic review for the US preventive services task force. JAMA, 2021, 325(10): 971-987.
8.	Zhao J, Li G, Zhao G, et al. Prognostic signature of lipid metabolism associated lncRNAs predict prognosis and treatment of lung adenocarcinoma. Front Oncol, 2022, 12: 986367.
9.	Kadota K, Nitadori JI, Sima CS, et al. Tumor spread through air spaces is an important pattern of invasion and impacts the frequency and location of recurrences after limited resection for small stageⅠ lung adenocarcinomas. J Thorac Oncol, 2015, 10(5): 806-814.
10.	Willner J, Narula N, Moreira AL. Updates on lung adenocarcinoma: invasive size, grading and STAS. Histopathology, 2024, 84(1): 6-17.
11.	Chen Z, Wu X, Fang T, et al. Prognostic impact of tumor spread through air spaces for T2aN0 stageⅠB non-small cell lung cancer. Cancer Med, 2023, 12(14): 15246-15255.
12.	Isgir BB, Kocaman G, Kahya Y, et al. Combination of grade and spread through air spaces (STAS) predicts recurrence in early stage lung adenocarcinoma: a retrospective cohort study. Updates Surg, 2025, 77(1): 201-208.
13.	Lu S, Tan KS, Kadota K, et al. Spread through air spaces (STAS) is an independent predictor of recurrence and lung cancer-specific death in squamous cell carcinoma. J Thorac Oncol, 2017, 12(2): 223-234.
14.	Kadota K, Kushida Y, Kagawa S, et al. Limited resection is associated with a higher risk of locoregional recurrence than lobectomy in stageⅠ lung adenocarcinoma with tumor spread through air spaces. Am J Surg Pathol, 2019, 43(8): 1033-1041.
15.	Zhou F, Villalba JA, Sayo TMS, et al. Assessment of the feasibility of frozen sections for the detection of spread through air spaces (STAS) in pulmonary adenocarcinoma. Mod Pathol, 2022, 35(2): 210-217.
16.	Villalba JA, Shih AR, Sayo TMS, et al. Accuracy and reproducibility of intraoperative assessment on tumor spread through air spaces in stage 1 lung adenocarcinomas. J Thorac Oncol, 2021, 16(4): 619-629.
17.	Choi RY, Coyner AS, Kalpathy-Cramer J, et al. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol, 2020, 9(2): 14.
18.	Braga-Neto UM, Dougherty ER. Machine learning requires probability and statistics [perspectives]. IEEE Signal Process Mag, 2020, 37(4): 118-122.
19.	Rajula HSR, Verlato G, Manchia M, et al. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina (Kaunas), 2020, 56(9): 455.
20.	Li HW, Zhu ZY, Sun YF, et al. Machine learning algorithms able to predict the prognosis of gastric cancer patients treated with immune checkpoint inhibitors. World J Gastroenterol, 2024, 30(40): 4354-4366.
21.	Chen S, Jiang L, Gao F, et al. Machine learning-based pathomics signature could act as a novel prognostic marker for patients with clear cell renal cell carcinoma. Br J Cancer, 2022, 126(5): 771-777.
22.	Dadashzadeh ER, Bou-Samra P, Huckaby LV, et al. Leveraging decision curve analysis to improve clinical application of surgical risk calculators. J Surg Res, 2021, 261: 58-66.
23.	Gu Y, Zheng B, Zhao T, et al. Computed tomography features and tumor spread through air spaces in lung adenocarcinoma: a meta-analysis. J Thorac Imaging, 2023, 38(2): W19-W29.
24.	林瓊真, 胡子良, 周戈, 等. CT特征對肺腺癌患者間隙轉移風險模型的構建分析. 中華肺部疾病雜志, 2021, 14(3): 308-311.Lin QZ, Hu ZL, Zhou G, et al. Construction analysis of CT characteristics of patients with lung adenocarcinoma in the risk model of space metastasis. Chin J Lung Dis, 2021, 14(3): 308-311.
25.	Luo B, Yang H, Fan N, et al. CT feature-based nomogram for predicting tumor spread through air spaces in stageⅠA lung adenocarcinoma. Cancer Imaging, 2025, 25(1): 72.
26.	曾慧, 譚鋒維, 袁振龍, 等. 氣腔播散對不同腫瘤大小的pT1N0M0期肺腺癌患者術后無復發生存期的影響分析. 中華醫學雜志, 2022, 102(19): 1430-1436.Zeng H, Tan FW, Yuan ZL, et al. Analysis of the effect of spread through air spaces on postoperative recurrence-free survival in patients with stage pT1N0M0 lung adenocarcinoma of different tumor size. Natl Med J China, 2022, 102(19): 1430-1436.
27.	Koike S, Shimizu K, Ide S, et al. Is using a consolidation tumor ratio 0.5 as criterion feasible in daily practice? Evaluation of interobserver measurement variability of consolidation tumor ratio of lung cancer less than 3 cm in size. Thorac Cancer, 2022, 13(21): 3018-3024.
28.	Minami S, Ihara S, Kim SH, et al. Lymphocyte to monocyte ratio and modified glasgow prognostic score predict prognosis of lung adenocarcinoma without driver mutation. World J Oncol, 2018, 9(1): 13-20.
29.	Yoshida C, Kadota K, Ishikawa R, et al. Preoperative monocyte count is a predictor of recurrence after stageⅠ lung adenocarcinoma resection. Interact Cardiovasc Thorac Surg, 2022, 34(6): 1081-1088.
30.	Zhang X, Han X, Zuo P, et al. CEACAM5 stimulates the progression of non-small-cell lung cancer by promoting cell proliferation and migration. J Int Med Res, 2020, 48(9): 300060520959478.
31.	Saad HM, Tourky GF, Al-Kuraishy HM, et al. The potential role of MUC16 (CA125) biomarker in lung cancer: a magic biomarker but with adversity. Diagnostics (Basel), 2022, 12(12): 2985.
32.	Joannidis M, Wiedermann CJ, Ostermann M. Ten myths about albumin. Intensive Care Med, 2022, 48(5): 602-605.
33.	Yang Y, Li L, Hu H, et al. A nomogram integrating the clinical and CT imaging characteristics for assessing spread through air spaces in clinical stageⅠA lung adenocarcinoma. Front Immunol, 2025, 16: 1519766.
34.	Park CH, Kim TH, Lee S, et al. Correlation between maximal tumor diameter of fresh pathology specimens and computed tomography images in lung adenocarcinoma. PLoS One, 2019, 14(1): e0211141.
35.	馮靖, 邵威, 曹瑕尹, 等. 外周ⅠA期肺小腺癌(≤2 cm)氣腔播散的相關因素分析及諾莫圖模型的建立. 中華胸心血管外科雜志, 2024, 40(3): 129-136.Feng J, Shao W, Cao XY, et al. Analysis of factors associated with spread through air spaces(STAS) of small adenocarcinomas(≤2 cm) in peripheral stage ⅠA lungs and modeling of nomograms. Chin J Thorac Cardiovasc Surg, 2024, 40(3): 129-136.
36.	Koezuka S, Sano A, Azuma Y, et al. Combination of mean CT value and maximum CT value as a novel predictor of lepidic predominant lesions in small lung adenocarcinoma presenting as solid nodules. Sci Rep, 2022, 12(1): 5450.
37.	Kim SK, Kim TJ, Chung MJ, et al. Lung adenocarcinoma: CT features associated with spread through air spaces. Radiology, 2018, 289(3): 831-840.
38.	Liu LJ, Brown SL, Ewing JR, et al. Estimation of tumor interstitial fluid pressure (TIFP) noninvasively. PLoS One, 2016, 11(7): e0140892.
39.	Wang P, Cui J, Du H, et al. Preoperative prediction of STAS risk in primary lung adenocarcinoma using machine learning: an interpretable model with SHAP analysis. Acad Radiol, 2025, 32(7): 4266-4277.
40.	Liu Q, Qi W, Wu Y, et al. Construction of pulmonary nodule CT radiomics random forest model based on artificial intelligence software for STAS evaluation of stageⅠA lung adenocarcinoma. Comput Math Methods Med, 2022, 2022: 2173412.
41.	Lee MA, Kang J, Lee HY, et al. Spread through air spaces (STAS) in invasive mucinous adenocarcinoma of the lung: incidence, prognostic impact, and prediction based on clinicoradiologic factors. Thorac Cancer, 2020, 11(11): 3145-3154.
42.	Zhang Z, Zhao Y, Ma YJ, et al. Prediction of STAS in lung adenocarcinoma with nodules≤2 cm using machine learning: a multicenter retrospective study. BMC Cancer, 2025, 25(1): 417.

1. Bray F, Laversanne M, Sung H, et al. Global cancer statistics 2022: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J Clin, 2024, 74(3): 229-263.
2. Zhang Y, Vaccarella S, Morgan E, et al. Global variations in lung cancer incidence by histological subtype in 2020: a population-based study. Lancet Oncol, 2023, 24(11): 1206-1218.
3. Travis WD, Brambilla E, Noguchi M, et al. International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society international multidisciplinary classification of lung adenocarcinoma. J Thorac Oncol, 2011, 6(2): 244-285.
4. Liu J, Ji Y, Weng X, et al. Immune microenvironment analysis and novel biomarkers of early-stage lung adenocarcinoma evolution. Front Oncol, 2023, 13: 1150098.
5. Jeong WG, Choi H, Chae KJ, et al. Prognosis and recurrence patterns in patients with early stage lung cancer: a multi-state model approach. Transl Lung Cancer Res, 2022, 11(7): 1279-1291.
6. Yun JK, Lee GD, Choi S, et al. Various recurrence dynamics for non-small cell lung cancer depending on pathological stage and histology after surgical resection. Transl Lung Cancer Res, 2022, 11(7): 1327-1336.
7. Jonas DE, Reuland DS, Reddy SM, et al. Screening for lung cancer with low-dose computed tomography: updated evidence report and systematic review for the US preventive services task force. JAMA, 2021, 325(10): 971-987.
8. Zhao J, Li G, Zhao G, et al. Prognostic signature of lipid metabolism associated lncRNAs predict prognosis and treatment of lung adenocarcinoma. Front Oncol, 2022, 12: 986367.
9. Kadota K, Nitadori JI, Sima CS, et al. Tumor spread through air spaces is an important pattern of invasion and impacts the frequency and location of recurrences after limited resection for small stageⅠ lung adenocarcinomas. J Thorac Oncol, 2015, 10(5): 806-814.
10. Willner J, Narula N, Moreira AL. Updates on lung adenocarcinoma: invasive size, grading and STAS. Histopathology, 2024, 84(1): 6-17.
11. Chen Z, Wu X, Fang T, et al. Prognostic impact of tumor spread through air spaces for T2aN0 stageⅠB non-small cell lung cancer. Cancer Med, 2023, 12(14): 15246-15255.
12. Isgir BB, Kocaman G, Kahya Y, et al. Combination of grade and spread through air spaces (STAS) predicts recurrence in early stage lung adenocarcinoma: a retrospective cohort study. Updates Surg, 2025, 77(1): 201-208.
13. Lu S, Tan KS, Kadota K, et al. Spread through air spaces (STAS) is an independent predictor of recurrence and lung cancer-specific death in squamous cell carcinoma. J Thorac Oncol, 2017, 12(2): 223-234.
14. Kadota K, Kushida Y, Kagawa S, et al. Limited resection is associated with a higher risk of locoregional recurrence than lobectomy in stageⅠ lung adenocarcinoma with tumor spread through air spaces. Am J Surg Pathol, 2019, 43(8): 1033-1041.
15. Zhou F, Villalba JA, Sayo TMS, et al. Assessment of the feasibility of frozen sections for the detection of spread through air spaces (STAS) in pulmonary adenocarcinoma. Mod Pathol, 2022, 35(2): 210-217.
16. Villalba JA, Shih AR, Sayo TMS, et al. Accuracy and reproducibility of intraoperative assessment on tumor spread through air spaces in stage 1 lung adenocarcinomas. J Thorac Oncol, 2021, 16(4): 619-629.
17. Choi RY, Coyner AS, Kalpathy-Cramer J, et al. Introduction to machine learning, neural networks, and deep learning. Transl Vis Sci Technol, 2020, 9(2): 14.
18. Braga-Neto UM, Dougherty ER. Machine learning requires probability and statistics [perspectives]. IEEE Signal Process Mag, 2020, 37(4): 118-122.
19. Rajula HSR, Verlato G, Manchia M, et al. Comparison of conventional statistical methods with machine learning in medicine: diagnosis, drug development, and treatment. Medicina (Kaunas), 2020, 56(9): 455.
20. Li HW, Zhu ZY, Sun YF, et al. Machine learning algorithms able to predict the prognosis of gastric cancer patients treated with immune checkpoint inhibitors. World J Gastroenterol, 2024, 30(40): 4354-4366.
21. Chen S, Jiang L, Gao F, et al. Machine learning-based pathomics signature could act as a novel prognostic marker for patients with clear cell renal cell carcinoma. Br J Cancer, 2022, 126(5): 771-777.
22. Dadashzadeh ER, Bou-Samra P, Huckaby LV, et al. Leveraging decision curve analysis to improve clinical application of surgical risk calculators. J Surg Res, 2021, 261: 58-66.
23. Gu Y, Zheng B, Zhao T, et al. Computed tomography features and tumor spread through air spaces in lung adenocarcinoma: a meta-analysis. J Thorac Imaging, 2023, 38(2): W19-W29.
24. 林瓊真, 胡子良, 周戈, 等. CT特征對肺腺癌患者間隙轉移風險模型的構建分析. 中華肺部疾病雜志, 2021, 14(3): 308-311.Lin QZ, Hu ZL, Zhou G, et al. Construction analysis of CT characteristics of patients with lung adenocarcinoma in the risk model of space metastasis. Chin J Lung Dis, 2021, 14(3): 308-311.
25. Luo B, Yang H, Fan N, et al. CT feature-based nomogram for predicting tumor spread through air spaces in stageⅠA lung adenocarcinoma. Cancer Imaging, 2025, 25(1): 72.
26. 曾慧, 譚鋒維, 袁振龍, 等. 氣腔播散對不同腫瘤大小的pT1N0M0期肺腺癌患者術后無復發生存期的影響分析. 中華醫學雜志, 2022, 102(19): 1430-1436.Zeng H, Tan FW, Yuan ZL, et al. Analysis of the effect of spread through air spaces on postoperative recurrence-free survival in patients with stage pT1N0M0 lung adenocarcinoma of different tumor size. Natl Med J China, 2022, 102(19): 1430-1436.
27. Koike S, Shimizu K, Ide S, et al. Is using a consolidation tumor ratio 0.5 as criterion feasible in daily practice? Evaluation of interobserver measurement variability of consolidation tumor ratio of lung cancer less than 3 cm in size. Thorac Cancer, 2022, 13(21): 3018-3024.
28. Minami S, Ihara S, Kim SH, et al. Lymphocyte to monocyte ratio and modified glasgow prognostic score predict prognosis of lung adenocarcinoma without driver mutation. World J Oncol, 2018, 9(1): 13-20.
29. Yoshida C, Kadota K, Ishikawa R, et al. Preoperative monocyte count is a predictor of recurrence after stageⅠ lung adenocarcinoma resection. Interact Cardiovasc Thorac Surg, 2022, 34(6): 1081-1088.
30. Zhang X, Han X, Zuo P, et al. CEACAM5 stimulates the progression of non-small-cell lung cancer by promoting cell proliferation and migration. J Int Med Res, 2020, 48(9): 300060520959478.
31. Saad HM, Tourky GF, Al-Kuraishy HM, et al. The potential role of MUC16 (CA125) biomarker in lung cancer: a magic biomarker but with adversity. Diagnostics (Basel), 2022, 12(12): 2985.
32. Joannidis M, Wiedermann CJ, Ostermann M. Ten myths about albumin. Intensive Care Med, 2022, 48(5): 602-605.
33. Yang Y, Li L, Hu H, et al. A nomogram integrating the clinical and CT imaging characteristics for assessing spread through air spaces in clinical stageⅠA lung adenocarcinoma. Front Immunol, 2025, 16: 1519766.
34. Park CH, Kim TH, Lee S, et al. Correlation between maximal tumor diameter of fresh pathology specimens and computed tomography images in lung adenocarcinoma. PLoS One, 2019, 14(1): e0211141.
35. 馮靖, 邵威, 曹瑕尹, 等. 外周ⅠA期肺小腺癌(≤2 cm)氣腔播散的相關因素分析及諾莫圖模型的建立. 中華胸心血管外科雜志, 2024, 40(3): 129-136.Feng J, Shao W, Cao XY, et al. Analysis of factors associated with spread through air spaces(STAS) of small adenocarcinomas(≤2 cm) in peripheral stage ⅠA lungs and modeling of nomograms. Chin J Thorac Cardiovasc Surg, 2024, 40(3): 129-136.
36. Koezuka S, Sano A, Azuma Y, et al. Combination of mean CT value and maximum CT value as a novel predictor of lepidic predominant lesions in small lung adenocarcinoma presenting as solid nodules. Sci Rep, 2022, 12(1): 5450.
37. Kim SK, Kim TJ, Chung MJ, et al. Lung adenocarcinoma: CT features associated with spread through air spaces. Radiology, 2018, 289(3): 831-840.
38. Liu LJ, Brown SL, Ewing JR, et al. Estimation of tumor interstitial fluid pressure (TIFP) noninvasively. PLoS One, 2016, 11(7): e0140892.
39. Wang P, Cui J, Du H, et al. Preoperative prediction of STAS risk in primary lung adenocarcinoma using machine learning: an interpretable model with SHAP analysis. Acad Radiol, 2025, 32(7): 4266-4277.
40. Liu Q, Qi W, Wu Y, et al. Construction of pulmonary nodule CT radiomics random forest model based on artificial intelligence software for STAS evaluation of stageⅠA lung adenocarcinoma. Comput Math Methods Med, 2022, 2022: 2173412.
41. Lee MA, Kang J, Lee HY, et al. Spread through air spaces (STAS) in invasive mucinous adenocarcinoma of the lung: incidence, prognostic impact, and prediction based on clinicoradiologic factors. Thorac Cancer, 2020, 11(11): 3145-3154.
42. Zhang Z, Zhao Y, Ma YJ, et al. Prediction of STAS in lung adenocarcinoma with nodules≤2 cm using machine learning: a multicenter retrospective study. BMC Cancer, 2025, 25(1): 417.

Chinese Journal of Clinical Thoracic and Cardiovascular Surgery

Latest ArticlesDevelopment of a machine learning-based preoperative prediction model for spread through air spaces in early-stage lung adenocarcinoma

Abstract Full text Figures/Tables Video References Cited by

Format

Content