[en] Numerous empirical studies analyse evolving open source software (OSS) projects, and try to estimate the activity and effort in these projects. Most of these studies, however, only focus on a limited set of artefacts, being source code and defect data. In our research, we extend the analysis by also taking into account mailing list information. The main goal of this article is to find evidence for the Pareto principle in this context, by studying how the activity of developers and users involved in OSS projects is distributed: it appears that most of the activity is carried out by a small group of people. Following the GQM paradigm, we provide evidence for this principle. We selected a range of metrics used in economy to measure inequality in distribution of wealth, and adapted these metrics to assess how OSS project activity is distributed. Regardless of whether we analyse version repositories, bug trackers, or mailing lists, and for all three projects we studied, it turns out that the distribution of activity is highly imbalanced.
J. Fernandez-Ramil, A. Lozano, M. Wermelinger, and A. Capiluppi, "Empirical studies of open source evolution," in Software Evolution, T. Mens and S. Demeyer, Eds. Springer, 2008, pp. 263-288.
V. R. Basili, "Software modeling and measurement: the goal/question/metric paradigm," College Park, MD, USA, Tech. Rep., 1992.
R. Vasa, M. Lumpe, P. Branch, and O. Nierstrasz, "Comparative analysis of evolving software systems using the Gini coefficient," in Proc. Int'l Conf. Software Maintenance, 2009, pp. 179-188.
A. Serebrenik and M. van den Brand, "Theil index for aggregation of software metrics values," in IEEE International Conference on Software Maintenance. Los Alamitos, CA, USA: IEEE Computer Society, 2010, pp. 1-9.
H. Theil, Economics and information theory. Center Math. Stud. Business Econ., Univ. Chicago, 1967.
M. Goeminne and T. Mens, "A framework for analysing and visualising open source software ecosystems," in Proceedings of the Joint ERCIM Workshop on Software Evolution (EVOL) and International Workshop on Principles of Software Evolution (IWPSE), ser. IWPSE-EVOL '10. New York, NY, USA: ACM, 2010, pp. 42-47.
M. Newman, "Power laws, Pareto distributions and Zipf's law," Contemporary Physics, vol. 46, no. 5, pp. 323-351, 2005. (Pubitemid 41335662)
I. Herraiz, "A statistical examination of the evolution and properties of libre software," Ph.D. dissertation, Universidad Rey Juan Carlos, 2008.
C. Bird, A. Gourley, P. Devanbu, M. Gertz, and A. Swaminathan, "Mining email social networks," in MSR '06: Proceedings of the 3rd IEEE International Working Conference on Mining Software Repositories, A. Press, Ed., Shanghai, China, May 2006.
M. Hardy, "Pareto's law," The Mathematical Intelligencer, vol. 32, pp. 38-43, 2010, 10.1007/s00283-010-9159-2.
A. Clauset, C. R. Shalizi, and M. E. J. Newman, "Power-law distributions in empirical data," SIAM Review, vol. 51, no. 4, pp. 661-703, 2009.
M. Antikainen, T. Aaltonen, and J. Vaisanen, "The role of trust in OSS communities-case Linux Kernel community," in Open Source Development, Adoption and Innovation. Springer, Jun. 2007, pp. 223-228.
B. Vasilescu, A. Serebrenik, and M. van den Brand, "Comparative study of software metrics aggregation techniques," in BENEVOL 2010, December 2010.
M. Goeminne and T. Mens, "A comparison of identity merging algortihms for open source software ecosystems," Journal on Systems and Software. [Submitted], 2011.
G. Robles, J. M. Gonzalez-Barahona, and I. Herraiz, "Evolution of the core team of developers in libre software projects," in MSR '09: Proceedings of the 6th IEEE International Working Conference on Mining Software Repositories. Washington, DC, USA: IEEE Computer Society, 2009, pp. 167-170.
S. Minto and G. C. Murphy, "Recommending emergent teams," in Proceedings of the Fourth International Workshop on Mining Software Repositories. Washington, DC, USA: IEEE Computer Society, 2007, pp. 5-.
M. D'Ambros, H. Gall, M. Lanza, and M. Pinzger, "Analysing software repositories to understand software evolution," in Software Evolution, T. Mens and S. Demeyer, Eds. Springer, 2008, pp. 37-67.
S. Diehl, H. C. Gall, and A. E. Hassan, Eds., Special Issue on Mining Software Repositories, ser. Empirical Software Engineering, vol. 14, no. 3, Jun. 2010.
R. Abreu and R. Premraj, "How developer communication frequency relates to bug introducing changes," in IWPSE-Evol '09: Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops. New York, NY, USA: ACM, 2009, pp. 153-158.
D. M. German, "Mining cvs repositories, the softchange experience," in Proceedings of the First International Workshop on Mining Software Repositories, Edinburg, Scotland, UK, 2004, pp. 17-21.
W. Poncin, A. Serebrenik, and M. van den Brand, "Process mining software repositories," in CSMR '11: Proceedings of the European Conference on Software Maintenance and Reengineering., 2011.
I. Herraiz, G. Robles, and J. M. Gonzalez-Barahona, "Research friendly software repositories," in IWPSE-Evol '09: Proceedings of the joint international and annual ERCIM workshops on Principles of software evolution (IWPSE) and software evolution (Evol) workshops. New York, NY, USA: ACM, 2009, pp. 19-24.
I. Herraiz, D. Izquierdo-Cortazar, and F. Rivas-Hernández, "Flossmetrics: Free/libre/open source software metrics," in CSMR '09: Proceedings of the 2009 European Conference on Software Maintenance and Reengineering. Washington, DC, USA: IEEE Computer Society, 2009, pp. 281-284.
M. E. J. Newman, "The Structure and Function of Complex Networks," SIAM Review, vol. 45, no. 2, pp. 167-256, 2003.
S. Redner, "How popular is your paper? An empirical study of the citation distribution," The European Physical Journal B, vol. 4, p. 131, 1998.
F. Lilijeros, C. Edling, L. Amaral, E. Stanley, and Y. åberg, "The web of human sexual contacts," Nature, vol. 411, pp. 907-908, 2001.
M. Mitzenmacher, "Dynamic models for file sizes and double Pareto distributions," Internet Mathematics, vol. 1, pp. 305-333, 2002.
F. Hunt and P. Johnson, "On the Pareto distribution of Open Source projects," in Proceedings of Open Source Software Development Workshop, Newcastle, UK, 2002.