[en] Collaborative software development relies on social coding platforms that integrate CI/CD tools to automate repetitive development tasks. This article focuses on GitHub and its CI/CD service GitHub Actions, that supports and promotes reusable Actions and reusable workflows to avoid duplication of workflow configuration code. Despite these reuse mechanisms, copy-paste practices persist, leading to a high amount of Type I and Type II clones in workflow files. We conduct the first large-scale quantitative clone analysis across 118K GitHub Actions workflows in 34K GitHub repositories, containing 364K instances of clones. We study the prevalence of such clones in workflows, identify the specific workflow entities that are more prone to cloning, the distribution of clones at different levels of granularity, and the extent to which clones could be mitigated by the reuse mechanisms offered by GitHub. Our findings reveal that two-thirds of workflows contain clones, the majority of which could be refactored using built-in reuse mechanisms.
Disciplines :
Computer science
DOI :
10.1109/SANER67736.2026.00028
Author, co-author :
Cardoen, Guillaume ; Université de Mons - UMONS > Faculté des Sciences > Service d'Informatique théorique
Mens, Tom ; Université de Mons - UMONS > Faculté des Sciences > Service de Génie Logiciel
Decan, Alexandre ; Université de Mons - UMONS > Faculté des Sciences > Service de Génie Logiciel
Language :
English
Title :
An Empirical Analysis of Code Clones in GitHub Actions Workflows
Publication date :
17 March 2026
Event name :
The 33rd IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2026)
Event place :
Limassol, Cyprus
Event date :
from 17 to 20 march 2026
Audience :
International
Main work title :
Proceedings SANER 2026
Publisher :
IEEE
Peer review/Selection committee :
Peer reviewed
Research unit :
S852 - Génie Logiciel
Research institute :
R300 - Institut de Recherche en Technologies de l'Information et Sciences de l'Informatique