Location: Arthropod-borne Animal Diseases Research
Title: A windowed correlation based feature selection method to improve time series prediction of dengue fever casesAuthor
FERDOUSI, TANVIR - Kansas State University | |
Cohnstaedt, Lee | |
SCOGLIO, CATERINA - Kansas State University |
Submitted to: IEEE Access
Publication Type: Peer Reviewed Journal Publication Acceptance Date: 10/4/2021 Publication Date: 11/23/2021 Citation: Ferdousi, T., Cohnstaedt, L.W., Scoglio, C. 2021. A windowed correlation based feature selection method to improve time series prediction of dengue fever cases. IEEE Access. 9:141210-141222. https://doi.org/10.1109/ACCESS.2021.3120309. DOI: https://doi.org/10.1109/ACCESS.2021.3120309 Interpretive Summary: Forecasting disease outbreaks is reliant on good complete data sets, but this is almost never available. Therefore to fill holes in a data set both temporally or spatially requires mathematical models that can extrapolate the data gaps or gather information from surrounding areas or time periods to still make accurate forecasts. This work focuses on dengue fever forecasts and how to improve the predictions using data from the surrounding geographic areas. Similar locations likely have similar problems and can compensate for under and over reporting in specific areas. The biggest problem is identifying how many areas to include and what is the quality of that data. A novel framework was created that looked at windows of time in the surrounding areas to quantify how similar they are to the area in question. Neural network-based prediction models achieved up to 33.3% more accurate predictions compared to just using the target area for predictions. Multiple tehcniques are reported for the window time series (fixed time windows and outbreak detection windows). Both techniques perform comparably. The framework is application-independent, and therefore can be used to improve other models where data is lacking but spatially adjacent areas have data. Technical Abstract: The performance of data-driven prediction models depends on the availability of data samples for model training. A model that learns about dengue fever incidence in a population uses historical data from that corresponding location. Poor performance in prediction can result in places with inadequate data. This work aims to enhance temporally limited dengue case data by methodological addition of epidemically relevant data from nearby locations as predictors (features). A novel framework is presented for windowing incidence data and computing time-shifted correlation-based metrics to quantify feature relevance. The framework ranks incidence data of adjacent locations around a target location by combining the correlation metric with two other metrics: spatial distance and local prevalence. Recurrent neural network-based prediction models achieve up to 33.6% accuracy improvement on average using the proposed method compared to using training data from the target location only. These models achieved mean absolute error (MAE) values as low as 0.128 on [0,1] normalized incidence data for a municipality with the highest dengue prevalence in Brazil’s Espirito Santo. When predicting cases aggregated over geographical ecoregions, the models achieved accuracy improvements up to 16.5%, using only 6.5% of incidence data from ranked feature sets. The paper also includes two techniques for windowing time series data: fixed-sized windows and outbreak detection windows. Both of these techniques perform comparably, while the window detection method uses less data for computations. The framework presented in this paper is application-independent, and it could improve the performances of prediction models where data from spatially adjacent locations are available. |