• 1. Operation and Management Department, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, P. R. China;
  • 2. West China Med Co. Ltd, Chengdu, Sichuan 610041, P. R. China;
  • 3. Business School, Sichuan University, Chengdu, Sichuan 610065, P. R. China;
  • 4. Outpatient Department, West China Hospital, Sichuan University, Chengdu, Sichuan 610041, P. R. China;
ZHANG Xinli, Email: zhangxinli1231@163.com; ZHAO Shuzhen, Email: ssszszzsz@126.com
Export PDF Favorites Scan Get Citation

Objective  To combine data generated by regional general hospitals, and propose an ensemble learning approach for the precise forecasting of major epidemic outbreaks. Methods  Drawing on preprocessed, multi-source data from a large general hospital in Southwest China from January 2020 to December 2022, we initially pinpointed critical early-warning departments. This step helped uncover the original variables for our forecasting framework. Subsequently, principal component analysis isolated the primary modeling factors. Ten common prediction models were established, and the base model was selected through comparison of 5 performance indicators and parameter optimization. Based on this, combined with 6 sampling techniques under 3 data balancing strategies, sampling learning models were constructed. Through performance comparison, an ensemble prediction model with confirmed cases as the outcome indicator was finally established. Results  Eight online or offline original variables were identified. Among them, the number of fever outpatient visits, the number of patients with positive signs in the Department of Integrated Traditional Chinese and Western Medicine, the volume of online consultations in the Department of Respiratory and Critical Care Medicine, and the volume of online consultations in the Infectious Disease Center could be used as predictors for epidemic forecasting. After eliminating collinearity, 3 principal components were extracted. Random forest was selected as the base model from 10 initial models. Based on performance comparisons among sampling-based ensemble models, the easy ensemble classifier-random forest (EEC-RF) model exhibited comparative advantages, yielding a recall rate of 0.857, an accuracy of 0.812, and an area under the curve (AUC) of 0.911. Conclusions  The EEC-RF framework offers superior classification and generalization to elevate the accuracy of forecasting sudden outbreaks when handling minority-class epidemic events. In particular, the model’s exceptional recall rate, accuracy and AUC underscore its viability as a primary tool for public health early-warning systems.

Citation: HE Lujia, WANG Ke, ZENG Yuting, ZHANG Xinli, LAN Xiaokai, ZHAO Shuzhen. Hospital data-driven early warning model for epidemic outbreaks based on ensemble learning techniques. West China Medical Journal, 2026, 41(4): 623-627. doi: 10.7507/1002-0179.202508125 Copy

Copyright ? the editorial department of West China Medical Journal of West China Medical Publisher. All rights reserved