A reference machine learning model for prediction of cholera epidemics based-on seasonal weather changes linkages in Tanzania

Leo, Judith

COSTECH Repository Home
→
The Nelson Mandela African Institution of Science and Technology
→
Computational and Communication Science Engineering
→
PhD Theses and Dissertations [CoCSE]
→
View Item

dc.creator	Leo, Judith
dc.date	2020-09-14T07:40:36Z
dc.date	2020-09-14T07:40:36Z
dc.date	2020-08
dc.date.accessioned	2022-10-25T09:15:30Z
dc.date.available	2022-10-25T09:15:30Z
dc.identifier	https://dspace.nm-aist.ac.tz/handle/20.500.12479/897
dc.identifier.uri	http://hdl.handle.net/123456789/94535
dc.description	A Thesis Submitted in Fulfilment of the Requirements for the Degree of Doctor of hilosophy in Information and Communication Science and Engineering of the Nelson Mandela African Institution of Science and Technology
dc.description	The Cholera epidemic remains a public threat throughout history, affecting vulnerable populations living with unreliable water and sub-standard sanitary conditions. Studies have observed that the occurrence of cholera has also, strong linkage with seasonal weather patterns. Over the past decades, there have been great achievements in developing cholera epidemic models which have focused on using mathematical techniques. However, most existing prediction systems have some challenges such as lack of flexibility, not user friendly, in-effective and also, lack integration of essential weather variables. In addition, the use of advanced technology such as machine learning (ML) have not been explicitly deployed in modeling cholera epidemics in developing countries including Tanzania; due to the challenges that come with its datasets such as missing-information, data-inconsistency, imbalance-class and other uncertainties. The aim of this work was to overcome and complement the existing challenges of cholera epidemic models by taking the advantages of ML techniques. Hence, by developing an ML model that is capable of predicting cholera epidemic outbreaks based-on seasonal weather changes linkages in Tanzania. Secondary datasets from Tanzania Meteorological Agency (TMA), the Ministry of Health and Social Welfare, and Dar es Salaam Water and Sewerage Authority (DAWASCO) were used. Then, Adaptive Synthetic Sampling Approach (ADASYN) and Principal Component Analysis (PCA) were applied to restore sampling balance and dimensions of the dataset. In order to determine which ML algorithms were best able to predict (yes/no) whether cholera epidemic would occur given the weather variables, ten classification algorithms were evaluated using F1-score, sensitivity and balancedaccuracy metrics. The Friedman-test was then used to determine whether the performance of the models was statistically significant. Results showed that Random Forest, Bagging, and ExtraTree classifiers had the best performance, with 74%, 74.1% and 71.9% accuracy respectively. The ensemble method of model fine-tuning was then applied in order to obtain one model from the three, and an overall accuracy of 78.5% was achieved. Lastly, a model evaluation process was performed on the selected final model. The model validation process involved four processes: The first evaluation process re-ran the final model using the same dataset but without the weather variables; which resulted into confirming that the model with weather variables to have higher performance compared to the model without the weather variable. The second evaluation process re-ran the model-development procedure using datasets from Tanga and Songwe regions in order to illustrate on how the adaptive reference model can be referenced by other researchers. The third and fourth model evaluation involved mixed-design approach of quantitative and qualitative methods using focus group discussions and interviewer-administered questionnaires with 500 and 20 stakeholders (including; medical officers, epidemiological analysts, nurses, environmental experts, ICT experts and cholera patients) respectively. The results of the third evaluation process proved that 90% of the responses agreed that, the developed model is robust and appropriate to work in least developing countries towards effective prediction of cholera epidemics. Whereas, the results of the fourth evaluation process proved also that cholera ML model is better in terms of their usability, expandability and computational complexity compared to the cholera statistical models. Overall, the study improved our understanding of the significant roles of ML strategies in health-care data. However, the study could not be treated as a time series problem due to data collection bias such as data-inconsistency in terms of time. The study recommends a review of health-care systems in order to facilitate quality data collection and further deployment of ML techniques in the health sector in Tanzania.
dc.format	application/pdf
dc.language	en
dc.publisher	NM-AIST
dc.rights	Attribution-NonCommercial-ShareAlike 4.0 International
dc.rights	http://creativecommons.org/licenses/by-nc-sa/4.0/
dc.subject	Research Subject Categories::NATURAL SCIENCES
dc.title	A reference machine learning model for prediction of cholera epidemics based-on seasonal weather changes linkages in Tanzania
dc.type	Thesis

Files in this item

Files	Size	Format	View
PhD_ICSE_Judith_Leo_2020.pdf	4.915Mb	application/pdf	View/Open

This item appears in the following Collection(s)

PhD Theses and Dissertations [CoCSE] [37]
This is Collection for PhD Theses and Dissertations [CoCSE]

Show simple item record

Search COSTECH

Advanced Search

Browse

All of COSTECH
This Collection
- By Issue Date
- Authors
- Titles
- Subjects

A reference machine learning model for prediction of cholera epidemics based-on seasonal weather changes linkages in Tanzania

Files in this item

This item appears in the following Collection(s)

Search COSTECH

Browse

All of COSTECH

This Collection

My Account