Abstract :
[en] Software repositories hosted on GitHub frequently use development bots to automate repetitive, effort intensive and error-prone tasks. To understand and study how these bots are used, state-of-the-art bot identification tools have been developed to detect bots based on their comments in commits, issues and pull requests. Given that bots can be involved in many other activity types, there is a need to consider more activities that they are carrying out in the software repositories they are involved in. We therefore propose a curated dataset of such activities carried out by bots and humans involved in GitHub repositories. The dataset was constructed by identifying 24 high-level activity types that could be extracted from 15 lower-level event types that were queried from GitHub's event stream API for all considered bots and humans. The proposed dataset contains around 834K activities performed by 408 bots and 655 humans involved in GitHub repositories, during an observation period ranging from 25 November 2022 to 9 March 2023. By analysing the activity patterns of bots and humans, this dataset could lead to better bot identification tools and empirical studies on how bots play a role in collaborative software development.
Funding text :
This work is supported by DigitalWallonia4.AI research project ARIAC (grant number 2010235), as well as by the Fonds de la Recherche Scientifique – FNRS under grant numbers F.4515.23, O.0157.18F-RG43 and T.0149.22.
Scopus citations®
without self-citations
1