Dissertation (MSc Information Technology)
The purpose of this study was to design and test a data cleaning prototype for use in data processing at LAPF Pensions Fund. The specific objectives were: (1) to analyse the root causes of data errors and identify the existing data errors in LAPF database (2) design, implement and test a data cleaning prototype to correct errors identified and (3) test the features of a data cleaning tool prototype in LAPF database. The judgemental sampling approach was used to select 19 respondents who are experts in loading data to the application system at LAPF Pensions Fund. Data were collected through questionnaires, observation and document reviews. These data were analysed using STATA software and Excel 2013 to establish the authenticity. The V- Model software development approach was applied to develop a data cleaning prototype for tasks performance. The results revealed that multiple capture approach and data migration were the major causes of data errors. The common data errors declared by respondents were mostly of punctuation type, that is: the comma, the dot, quotation marks, the semi colon, one, zero, vertical bar and brackets. After the discovery of errors sources, the data cleaning prototype with ability for inconsistence checking was developed. The research results showed that the designed data cleaning prototype improved the authenticity of data in LAPF.