Dissertation (MSc Computer Science)
Online Social Networks (OSN) provides active space for digital human interaction and are used daily. Human engagement is reflected by exploiting the dynamics of OSN, where the fundamental problem is to infer future interactions on the network, called link prediction. Most studies have employed classical algorithms which consider node similarity but neglected the link analysis algorithms which consider topological structure. This study focused on the comparative study of predicting reciprocal interaction from para-social interaction using algorithms. Particularly, this study selected PageRank and HITS, which are considered famous link analysis algorithms with high order heuristics. Network simulation was performed to understand the performance of the algorithms when used to predict reciprocal link formation by employing machine learning techniques. For the experiment, two datasets were used to ensure the reliability of the results. Initially, the publicly available secondary dataset of Twitter was used followed by primary dataset crawled from Mayocoo, both of which are directed networks. The resulting networks from both datasets adhere to power-law distribution. Resource allocation was used as the baseline for the study after outperforming Adamic-Adar, Jaccard Coefficient, and Preferential Attachment. The result of this study showed that both PageRank and HITS surpassed the baseline in performance of prediction. Thus, PageRank has an accuracy improvement of 1.8% with precision and recall of 4.8% and 1.1%, respectively. Furthermore, this improvement comes with a balance of 3% (f1-measure). When HITS is used, there is an improvement accuracy by 5%, with 15.1% (precision), 7.9% (recall) and 11.5% (f1-measure). These empirical results demonstrate that HITS outperforms PageRank in prediction performance. Also, the results from the computational test showed that PageRank uses less computational resources compared to HITS. This study suggests the use of link analysis algorithms over classical algorithms for reciprocal link prediction in OSN. Furthermore, the use of HITS is recommended when prediction performance is vital compared to computational cost, otherwise, PageRank in cases were computational resources are minimal.