Dissertation (MSc Computer Science)
The main purpose of this dissertation was to evaluate the performance of Cassandra and HBase NoSQL Databases, that present at Column-oriented category on handling streaming data. The dataset used for this evaluation was constructed with the help of the Twitter Streaming API. The environment which used to evaluate the performance of Cassandra and HBase on Streaming Data was Apache Spark with its ability to plot streaming data from source using Spark-R.
Several studies have been considered and came out with evaluation metrics. Among the metrics found include computation time, memory used, read and write bytes, and CPU usage.
The benchmark performance of the two column family NoSQL Databases (Cassandra and HBase) were completed. The researcher, benchmark 4 different implementations by setting the time interval of 5seconds, 10 seconds, 5 minutes and 10 minutes for 10 iterations with 20 days.
The performance on two NoSQL databases was evaluated in terms of computation time where throughput and latency time were the metrics. Cassandra seems to have the overall good performance in write operation when the streaming workload increase compared to HBase while HBase shows the overall low performance in computation for having high average latencies time, particularly in writing operation. To have accuracy result, each test results were averaging to came out with average results.