A Project Report Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Science in Embedded and Mobile Systems of the Nelson Mandela African Institution of Science and Technology
Diabetes is a chronic, metabolic disease characterized by elevated levels of blood glucose or
blood sugar that over time can bring severe damage to vital organs including the heart, blood
vessels, eyes, kidneys and nerves. Diabetes is therefore one of the major priorities in medical
science research. Type 2 diabetes is common in adults, either because of inadequate insulin
production, or when the body’s cells fail to respond properly to the produced insulin. For all the
diabetes cases, it’s found out that 90% are Type 2 diabetes. Of the 422 million people with
diabetes worldwide, 336 million people are found in developing countries, and 1.6 million
people die of diabetes each year according to statistics by the World Health Organization.
Around 19.8 million adults in Africa have Type 2 diabetes but approximately 75% are unaware
of their condition (undiagnosed). Most people are undiagnosed because many people lack
knowledge of symptoms for diabetes, and others are not diagnosed due to lack of testing kits
more especially in rural areas. African governments have scaled up purchasing and distribution
of diagnostic kits but the majority of the population has not been reached. Researchers have
been developing predictive models for Type 2 diabetes, but African populations are not widely
included in their datasets. The developed models may therefore not accurately identify at-risk
populations in the African context. The main emphasis of this research was to come up with a
machine learning prediction model to find out Ugandans likely to be suffering from Type 2
diabetes (output classes: high risk or low risk), based on input symptoms.
Random Forest, Support Vector Machine, Naïve Bayes, and AdaBoost classifiers were trained
on anonymised, real patient data with twelve features including age, gender (male or female),
systolic blood pressure, residence (town or village), diastolic blood pressure, family Member
with diabetes, alcohol intake, smoker, hypertensive, obesity, physically inactive and body mass
index. This research’s experimental results after the comparison of the Accuracy Score and
Confusion Matrix for all the above algorithms, the Random Forest classifier emerged the
premier with the accuracy score of 85.4%, thus the experimental results shown that
performance of Random Forest classifier as being significant superior compared to all other the
machine learning algorithms.