Dissertation (MSc Computer Science)
The diffusion of information generated in Social Networks Sites is the result of more people
being connected. The connected people chats and comments by posting contents like images,
video, and messages. In fact the social networks have been and are useful to communities in such
they bring relatives together especially in sharing experiences and feelings. Although social
networks have been beneficial to users, some of the shared messages and comments contain
sexual and political harassments. This is particularly the same in Kiswahili speaking countries
like Tanzania. Most if not all of the Kiswahili social networks sites, the offensive messages have
been and are publicly posted. These messages harass, embarrass, and even assault users and to
some extent lead to psychological effect. This study proposes a framework for automating the
detection of offensive messages on social networks in Kiswahili settings by applying some
selected machine learning techniques. Specifically, the study created Kiswahili dataset containing
sexual and political offensive messages and normal messages1. All of these messages were
collected from Facebook, YouTube, and JamiiForum and they were used for evaluating the
performance of the selected text classification algorithms. The collected messages were
preprocessed by using Bag-of-Word (BoW) model, Term Frequency Inverse Document
Frequency (TF-IDF) and N-grams techniques to generate feature vectors. The experimental
findings using the generated feature vectors showed that the Random Forest classifier was
capable of correctly assigning a message into a correct class label with an accuracy of 95.0259 %, f1-
Measure of 0.950 (95.0%) and false positive rate of 2.8 % when applied to three categorical dataset. On the other hand, the SVM-Linear showed better results when applied in two
categorical data. The study suggests the REST API based framework with random forest
classifier and Kiswahili dataset to be deployed in real social net