AbstractDue to their openness and low publishing barrier nature, User-Generated Content (UGC) platforms facilitate the creation of huge amounts of data, containing a substantial quantity of inaccurate content. The presence of misleading, questionable and inaccurate content may have detrimental effects on people's beliefs and decision-making and may create a public disturbance. Consequently, there is significant need to evaluate information coming from UGC platforms to differentiate credible information from misinformation and rumours. In this thesis, we present the need for research about online Arabic information credibility and argue that by extending the existing automated credibility assessment approaches to adding an extra step to evaluate labellers will lead to a more robust dataset for building the credibility classification model.
This research focuses on modelling the credibility of Arabic information in the presence of disagreed judging credibility scores and ground truth of credibility information is not absolute. First, in order to achieve the stated goal, this study employs the idea of crowdsourcing whereby users can explicitly express their opinions about the credibility of a set of tweet messages. This information coupled with the data about tweets’ features enables us to identify messages’ prominent features with the highest usage in determining information credibility levels. Then experiments based on both statistical analysis using features’ distributions and machine learning methods are performed to predict and classify messages’ credibility levels. A novel credibility assessment model which integrates the labellers’ reliability weights is proposed when deriving the credibility labels for the messages in the training and testing dataset. This credibility model primarily uses similarity and accuracy rating measurements for evaluating the weighting of labellers.
In order to evaluate proposed model, we compare the labelling obtained from the expert labellers with those from the weighted crowd labellers. Empirical evidence proposed that the credibility model is superior to the commonly used majority voting baseline compared to the experts’ rating evaluations. The observed experimental results exhibit a reduction of the effect of unreliable labellers’ credibility judgments and a moderate enhancement of the credibility classification results.
|Date of Award
|Solon Pissis (Supervisor) & Costas Iliopoulos (Supervisor)