Uncertainty-aware approach for multiple imputation using conventional and machine learning models: a real-world data study

Λεπτομέρειες βιβλιογραφικής εγγραφής
Τίτλος:	Uncertainty-aware approach for multiple imputation using conventional and machine learning models: a real-world data study
Συγγραφείς:	Wabina, Romen Samuel, Looareesuwan, Panu, Sonsilphong, Suphachoke, Teza, Htun, Ponthongmak, Wanchana, McKay, Gareth, Attia, John, Pattanateepapon, Anuchate, Panitchote, Anupol, Thakkinstian, Ammarin
Πηγή:	Journal of Big Data, Vol 12, Iss 1, Pp 1-28 (2025) Wabina, R S, Looareesuwan, P, Sonsilphong, S, Teza, H, Ponthongmak, W, McKay, G, Attia, J, Pattanateepapon, A, Panitchote, A & Thakkinstian, A 2025, 'Uncertainty-aware approach for multiple imputation using conventional and machine learning models: a real-world data study', Journal of Big Data, vol. 12, 95. https://doi.org/10.1186/s40537-025-01136-3
Στοιχεία εκδότη:	Springer Science and Business Media LLC, 2025.
Έτος έκδοσης:	2025
Θεματικοί όροι:	Computer engineering. Computer hardware, Missing data, name=Computer Networks and Communications, Information technology, QA75.5-76.95, T58.5-58.64, Real-world data, TK7885-7895, name=Information Systems, name=Hardware and Architecture, Uncertainty functions, Electronic computers. Computer science, Multiple imputation, Uncertainty-aware models, name=Information Systems and Management
Περιγραφή:	Missing data poses a significant challenge in clinical real-world studies, often arising from unplanned data collection, misplacement, patient loss to follow-up, and other factors. While multiple imputation by chained equations (MICE) is a widely used method, its sequential nature introduces uncertainty, potentially impacting the prediction model performance. We proposed and evaluated three uncertainty-aware functions (i.e., uncertainty sampling (US), probability of improvement (PI), and expected improvement (EI)) integrated with linear regression (LinearReg), decision tree (DT), random forest (RF), and extreme gradient boosting (XGBoost) using three large datasets: chronic kidney disease (CKD, n = 31,043), hypertension cohort from Ramathibodi Hospital (HT-RAMA, n = 140,047) and Khon Kaen University Hospital (HT-KKU, n = 108,942) with high missing rates. In the CKD cohort, uncertainty-aware models significantly improved performance (evaluated by root mean squared error (RMSE) and mean absolute error (MAE)) over standard MICE, except for XGBoost. LinearReg-EI performed best (RMSE 0.12, MAE 0.36), followed by RF-EI (RMSE 0.22, MAE 0.34), and DT-EI (RMSE 0.21, MAE 0.38). In HT-RAMA, LinearReg-US performed best (RMSE 0.24, MAE 8.15), outperforming RF-US (RMSE 0.92, MAE 8.58) and DT-PI (RMSE 0.96, MAE 8.74). Similarly, in HT-KKU, LinearReg-US performed best (RMSE 0.98, MAE 12.00), followed by RF-PI (RMSE 1.93, MAE 12.90) and DT-US (RMSE 2.10, MAE 12.63). Uncertainty-aware models produced imputed distributions closely resembling the original data, unlike standard MICE. Our findings suggest that incorporating uncertainty functions can improve MICE, particularly for LinearReg, RF and DT. Further research is warranted to validate these findings across diverse clinical settings and model types.
Τύπος εγγράφου:	Article
Περιγραφή αρχείου:	application/pdf
Γλώσσα:	English
ISSN:	2196-1115
DOI:	10.1186/s40537-025-01136-3
Σύνδεσμος πρόσβασης:	https://doaj.org/article/bcc735dd92bc418ca17021dd249e6f1a https://pure.qub.ac.uk/en/publications/af563164-c38d-4191-acbf-a39d6fee8c28
Rights:	CC BY
Αριθμός Καταχώρησης:	edsair.doi.dedup.....2ffc56379e3896bf775bf715c70f2ba5
Βάση Δεδομένων:	OpenAIRE

View record at OpenAIRE

View record at Springer

καταχωρήστε σχόλιο πρώτοι!