A Comparison of Retweet Prediction Approaches: The Superiority of Random Forest Learning Method
Abstract: We consider the
following retweet prediction task: given a tweet, predict whether it will be retweeted.
In the past, a wide range of learning methods and features has been proposed
for this task. We provide a systematic comparison of the performance of these
learning methods and features in terms of prediction accuracy and feature
importance. Specifically, from each previously published approach we take the
best performing features and group these into two sets: user features and tweet
features. In addition, we contrast five learning methods, both linear and
non-linear. On top of that, we examine the added valueof a previously proposed
time-sensitive modeling approach. To the authors’ knowledge this is the first attempt
to collect best performing features and contrast linear and non-linear learning
methods. We perform our comparisons on a single dataset and find that user
features such as the number of times a user is listed, number of followers, and
average number of tweets published per day most strongly contribute to
prediction accuracy across selected learning methods. We also find that a
random forestbased learning, which has not been employed in previous studies,
achieves the highest performance among the learning methods we consider. We
also find that on top of properly tuned learning methods the benefits of
time-sensitive modeling are very limited.
Author: Hendra Bunyamin, Tomas
Tunys
Journal Code: jptkomputergg160259