With the widespread adoption of lung cancer screening, an increasing number of patients are being diagnosed with early-stage lung adenocarcinoma. For stage ⅠA lung adenocarcinoma, sublobar resection is the primary treatment approach. However, in patients with concomitant spread through air space (STAS), numerous studies advocate for lobectomy as the mainstay of treatment. Due to the limitations in preoperative prediction and intraoperative frozen section evaluation for assessing STAS, current research is largely restricted to using clinical and imaging features to predict STAS occurrence, with results that are inconsistent and unsatisfactory. Furthermore, most studies focus on individual clinical or imaging characteristics, and there is a lack of large-sample investigations. The rise of artificial intelligence in recent years has provided new insights into solving this problem, and existing studies have shown that artificial intelligence demonstrates better performance in STAS prediction compared to conventional methods. This article reviews the value of artificial intelligence in predicting STAS.
ObjectiveTo develop and validate a machine learning model based on preoperative clinical characteristics, laboratory indices, and radiological features for the non-invasive prediction of spread through air spaces (STAS) in patients with early-stage lung adenocarcinoma. Methods Preoperative data from patients with early-stage lung adenocarcinoma who underwent surgical resection at Northern Jiangsu People's Hospital between January 2020 and August 2025 were retrospectively collected. The data included clinical characteristics, laboratory indices, and radiological features. Patients were divided into a STAS-positive and a STAS-negative group based on postoperative pathological findings. The dataset was randomly split into a training set and a testing set at a 7 : 3 ratio. Feature variables were selected using the maximum relevance and minimum redundancy (mRMR) algorithm and the least absolute shrinkage and selection operator (LASSO) regression. Five machine learning models were constructed: logistic regression (LR), random forest (RF), support vector machine (SVM), light gradient boosting machine (LightGBM), and extreme gradient boosting (XGBoost). Model performance was evaluated using the area under the receiver operating characteristic curve (AUC) and decision curve analysis (DCA). The shapley additive explanations (SHAP) method was employed to interpret the optimal prediction model. Results A total of 377 patients were included, comprising 177 (46.9%) males and 200 females (53.1%), with a mean age of (63.31±9.73) years. There were 261 patients in the training set and 116 patients in the testing set. In the training set, statistically significant differences were observed between the STAS-positive group (n=130) and STAS-negative group (n=131) across multiple features, including age, sex, neutrophil-to-lymphocyte ratio (NLR), monocyte-to-lymphocyte ratio (MLR), clinical T stage, and maximum solid component diameter (P<0.05). A final set of 10 feature variables was selected by combining mRMR and LASSO regression, and five machine learning models (LR, RF, SVM, LightGBM, XGBoost) were developed. The XGBoost model demonstrated superior predictive performance in both the training and testing sets, achieving AUCs of 0.947 [95%CI (0.920, 0.975)] and 0.943 [95%CI (0.894, 0.993)], respectively, and achieved the optimal level in the testing set. DCA indicated that the XGBoost model provided a high net clinical benefit across a wide range of threshold probabilities. SHAP analysis revealed that the vessel convergence sign, clinical T stage, age, consolidation-to-tumor ratio (CTR), and MLR were the features with the highest contributions to STAS prediction. Conclusion The XGBoost model effectively predicts preoperative STAS status in early-stage lung adenocarcinoma, exhibiting excellent discriminative performance and good clinical interpretability. Key predictors such as the vessel convergence sign, clinical T stage, age and CTR provide a crucial reference for preoperative risk assessment and the individualized selection of surgical strategies, ultimately benefiting patients.
Objective To evaluate the clinical radiological features combined with circulating tumor cells (CTCs) in the diagnosis of invasiveness evaluation of subsolid nodules in lung cancers. Methods Clinical data of 296 patients from the First Hospital of Lanzhou University between February 2019 and February 2021 were retrospectively included. There were 130 males and 166 females with a median age of 62.00 years. Patients were randomly divided into a training set and an internal validation set with a ratio of 3 : 1 by random number table method. The patients were divided into two groups: a preinvasive lesion group (atypical adenomatoid hyperplasia and adenocarcinoma in situ) and an invasive lesion group (microinvasive adenocarcinoma and invasive adenocarcinoma). Independent risk factors were selected by regression analysis of training set and a Nomogram prediction model was constructed. The accuracy and consistency of the model were verified by the receiver operating characteristic curve and calibration curve respectively. Subgroup analysis was conducted on nodules with different diameters to further verify the performance of the model. Specific performance metrics, including sensitivity, specificity, positive predictive value, negative predictive value and accuracy at the threshold were calculated. Results Independent risk factors selected by regression analysis for subsolid nodules were age, CTCs level, nodular nature, lobulation and spiculation. The Nomogram prediction mode provided an area under the curve (AUC) of 0.914 (0.872, 0.956), outperforming clinical radiological features model AUC [0.856 (0.794, 0.917), P=0.003] and CTCs AUC [0.750 (0.675, 0.825), P=0.001] in training set. C-index was 0.914, 0.894 and corrected C-index was 0.902, 0.843 in training set and internal validation set, respectively. The AUC of the prediction model in training set was 0.902 (0.848, 0.955), 0.913 (0.860, 0.966) and 0.873 (0.730, 1.000) for nodule diameter of 5-20 mm, 10-20 mm and 21-30 mm, respectively. Conclusion The prediction model in this study has better diagnostic value, and is more effective in clinical diagnosis of diseases.