A Dissertation Submitted in Partial Fulfillment of the Requirements for the Degree of Master’s in Information and Communication Science and Engineering of the Nelson Mandela African Institution of Science and Technology
For many years, plagiarism has remained a serious problem in Higher Learning Institutions
(HLIs). Despite having adverse effects on the quality of education, plagiarism has been rapidly
increasing in HLIs across the globe. Many researchers agree that the ease of access to information
over the internet has made plagiarism a common occurrence in Tanzanian HLIs. In order to
address this problem, many Tanzanian HLIs have enacted strict anti-plagiarism policies that
require the use of software to detect plagiarism cases efficiently. Although many free and
commercial plagiarism detection software exist, HLIs in Tanzania face numerous challenges in
finding appropriate tools. The accuracy and effectiveness of freely available plagiarism detection
software have been continuously questioned as they often provide inconsistent results. Moreover,
commercial software that promise better performance have high annual subscription fees that
are not easily affordable by HLIs in developing countries. This study aimed to address the
need for affordable and reliable plagiarism detection tools in Tanzanian HLIs by developing an
efficient algorithm for plagiarism detection. The study employed a systematic approach that
involved different stakeholders in Tanzanian HLIs, including students, academic staff, and support
staff. Questionnaires, unstructured interviews, and thorough literature analysis were used to
identify the stakeholders’ needs and establish user requirements. The study proposed a plagiarism
detection algorithm using a machine learning approach to information extraction, graph-based
information retrieval, and semantic textual similarity methods. A web-based plagiarism detection
system that implements the proposed algorithm was developed using open source technologies
such as Symfony web framework, Neo4j graph database, MySQL database, and RabbitMQ.
User Acceptance Testing (UAT) results concluded that the stakeholders positively accepted
the developed algorithm. Furthermore, the developed web-based plagiarism detection system
has received a Copyright Clearance Certificate from The Copyright Association of Tanzania
(COSOTA), and The Committee of Vice Chancellors Principles and Provosts of Tanzania (CVCPT)
has recommended the deployment of the system in Tanzanian HLIs.