Objective To combine data generated by regional general hospitals, and propose an ensemble learning approach for the precise forecasting of major epidemic outbreaks. Methods Drawing on preprocessed, multi-source data from a large general hospital in Southwest China from January 2020 to December 2022, we initially pinpointed critical early-warning departments. This step helped uncover the original variables for our forecasting framework. Subsequently, principal component analysis isolated the primary modeling factors. Ten common prediction models were established, and the base model was selected through comparison of 5 performance indicators and parameter optimization. Based on this, combined with 6 sampling techniques under 3 data balancing strategies, sampling learning models were constructed. Through performance comparison, an ensemble prediction model with confirmed cases as the outcome indicator was finally established. Results Eight online or offline original variables were identified. Among them, the number of fever outpatient visits, the number of patients with positive signs in the Department of Integrated Traditional Chinese and Western Medicine, the volume of online consultations in the Department of Respiratory and Critical Care Medicine, and the volume of online consultations in the Infectious Disease Center could be used as predictors for epidemic forecasting. After eliminating collinearity, 3 principal components were extracted. Random forest was selected as the base model from 10 initial models. Based on performance comparisons among sampling-based ensemble models, the easy ensemble classifier-random forest (EEC-RF) model exhibited comparative advantages, yielding a recall rate of 0.857, an accuracy of 0.812, and an area under the curve (AUC) of 0.911. Conclusions The EEC-RF framework offers superior classification and generalization to elevate the accuracy of forecasting sudden outbreaks when handling minority-class epidemic events. In particular, the model’s exceptional recall rate, accuracy and AUC underscore its viability as a primary tool for public health early-warning systems.