Abstract :
[en] Collaborative software development through GitHub repositories frequently relies on bot accounts to automate repetitive and error-prone tasks. This highlights the need to have accurate and efficient bot identification tools. Several such tools have been proposed in
the past, but they tend to rely on a substantial amount of historical data, or they limit themselves to a reduced subset of activity types, making them difficult to use at large scale. To overcome these limitations, we developed RABBIT, an open source command-line tool that queries the GitHub Events API to retrieve the recent events of a given GitHub account and predicts whether the account is a human or a bot. RABBIT is based on an XGBoost classification model that relies on six features related to account activities and
achieves high performance, with an AUC, F1 score, precision and recall of 0.92. Compared to the state-of-the-art in bot identification, RABBIT exhibits a similar performance in terms of precision, recall and F1 score, while being more than an order of magnitude faster
and requiring considerably less data. This makes RABBIT usable on a large scale, capable of processing several thousand accounts per hour efficiently.
Funding text :
This work is supported by Service Public de Wallonie Recherche under grant number 2010235 - ARIAC by DigitalWallonia4.AI, and by the Fonds de la Recherche Scientifique – FNRS under grant numbers J.0147.24, T.0149.22, and F.4515.23.
Scopus citations®
without self-citations
3