[en] GitHub Actions is the de facto workflow automation tool for GitHub repositories. Its popularity has increased dramatically over the recent years, opening up opportunities for empirical studies related to its usage. To enable such studies, we implemented gigawork, an open source tool for extracting the commit histories of changes to workflow files in GitHub repositories. Using this tool we collected and publicly released a dataset of 160K+ commit histories of workflow files in 32K+ public GitHub repositories, covering 1.5M+ workflow file versions. In order to facilitate its use by other researchers, the dataset includes relevant metadata related to workflow file changes in each commit. gigawork is publicly released on PyPi. Its associated dataset can be found on Zenodo (DOI: 10.5281/zenodo.10259013).
Disciplines :
Computer science
DOI :
10.1145/3643991.3644867
Author, co-author :
Cardoen, Guillaume ; Université de Mons - UMONS > Faculté des Science > Service d'Informatique théorique
Mens, Tom ; Université de Mons - UMONS > Faculté des Sciences > Service de Génie Logiciel
Decan, Alexandre ; Université de Mons - UMONS > Faculté des Sciences > Service de Génie Logiciel
Language :
English
Title :
A dataset of GitHub Actions workflow histories
Publication date :
15 April 2024
Event name :
21st International Conference on Mining Software Repositories
Event organizer :
ACM
Event place :
Lisbon, Portugal
Event date :
15-16 April 2024
Event number :
21
Audience :
International
Main work title :
21st International Conference on Mining Software Repositories (MSR '24)