Abstract
We report on an exploratory study, which aims at understanding how software communities use blogs compared to conventional development infrastructures. We analyzed the behavior of 1,100 bloggers in four large open source communities, distinguishing between committing bloggers and other community members. We observed that these communities intensively use blogs with one new entry every 8 h. A blog entry includes 14 times more words than a commit message. When analyzing the content of the blogs, we found that committers and others bloggers write about similar topics. Most popular topics in committers’ blogs represent high-level concepts such as features and domain concepts, while source code related topics are discussed in 15% of their posts. Other community members frequently write about community events and conferences as well as configuration and deployment topics. We found that the blogging peak period is usually after the software is released. Moreover, committers are more likely to blog after corrective engineering than after forward engineering and re-engineering activities. Our findings call for a hypothesis-driven research to (a) further understand the role of social media in dissolving the collaboration boundaries between developers and other stakeholders and (b) integrate social media into development processes and tools.
Similar content being viewed by others
Notes
In an earlier version (Pagano and Maalej 2011) we reported a much smaller rate. After a manual analysis of 100 randomly selected blog posts, we discovered that the original scanning algorithm missed certain blocks. We are more confident with the results presented in this paper.
References
Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering. IEEE, pp 3–14
Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 SIGMOD conference on management of data, ACM, Washington, DC, USA, pp 207–216
Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering—ICSE ’10, p 375
Begel A, DeLine R, Zimmermann T (2010) Social media for software engineering. In: Proceedings of the FSE/SDP workshop on future of software engineering research. ACM, pp 33–38
Bettenburg N, Adams B, Hassan AE, Smidt M (2011) A lightweight approach to uncover technical artifacts in unstructured data. In: 2011 IEEE 19th international conference on program comprehension, pp 185–188
Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Extracting structural information from bug reports. In: Proceedings of the 2008 international workshop on mining software repositories—MSR ’08, p 27
Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: Proceedings of the 2006 international workshop on mining software repositories. ACM, pp 137–143
Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(4–5):993–1022
Crowston K, Heckman R, Annabi H, Masango C (2005) A structurational perspective on leadership in Free/libre open source software teams. In: Proceedings of the 1st conference on open source systems (OSS), Genova, Italy
Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26(3):297–302
Glance N, Hurst M, Tomokiyo T (2004) BlogPulse: Automated trend discovery for weblogs. In: Proceedings of the WWW 2004 workshop on the weblogging ecosystem: aggregation, analysis and dynamics, ACM, New York, NY, USA
Gruhl D, Liben-Nowell D, Guha R, Tomkins A (2004) Information diffusion through blogspace. ACM SIGKDD Explorations Newsletter 6(2):43–52
Guzzi A, Pinzger M, van Deursen A (2010) Combining micro-blogging and IDE interactions to support developers in their quests. In: Proceedings of the 26th international conference on software maintenance (ICSM), IEEE, 2010, pp 1–5
Hattori L, Lanza M (2008) On the nature of commits. In: ASE workshops. IEEE, pp 63–71
Kaplan AM, Haenlein M (2010) Users of the world, unite! The challenges and opportunities of social media. Bus Horiz 53(1):59–68
Maalej W, Happel H (2009) From work to word: how do software developers describe their work? In: Working conference on mining software repositories, pp 121–130
Maalej W, Happel H-J (2010) Can development work describe itself? In: 2010 7th IEEE working conference on mining software repositories (MSR 2010), pp 191–200
Maalej W, Pagano D (2011) On the socialness of software. In: Proceedings of the international conference on social computing and its applications. Sydney, Australia, IEEE
Maalej W, Panagiotou D, Happel H-J (2008) Towards effective management of software knowledge exploiting the semantic wiki paradigm. In: Herrmann K, Brügge B (eds) Software engineering. Bonn, Germany, GI, pp 183–197
Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. ACM Trans Softw Eng Methodol 11(3):309–346
Pagano D, Maalej W (2011) How do developers blog? an exploratory study. In: Proceedings of the 8th conference on mining software repositories. ACM
Parnin C, Treude C (2011) Measuring API documentation on the web. In: Proceeding of the 2nd international workshop on web 2.0 for software engineering, Web2SE ’11. ACM, New York, NY, USA, pp 25–30
Song X, Chi Y, Hino K, Tseng B (2007) Identifying opinion leaders in the blogosphere. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management. ACM, New York, New York, USA, pp 971–974
The Nielsen Company (2010) Led by Facebook, Twitter, global time spent on social media sites up 82% year over year
Treude C, Storey M-A (2009) How tagging helps bridge the gap between social and technical aspects in software development. In: ICSE ’09: proceedings of the 2009 IEEE 31st international conference on software engineering. IEEE Computer Society, Washington, DC, USA, pp 12–22
Tseng B, Tatemura J, Wu Y (2005) Tomographic clustering to visualize blog communities as mountain views. In: WWW 2005 workshop on the weblogging ecosystem. Citeseer
van Deursen A, Mesbah A, Cornelissen B, Zaidman A, Pinzger M, Guzzi A (2010) Adinda : a knowledgeable , browser-based IDE. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering. ACM, vol 2, pp 203–206
Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press
Zaki M (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1): 31–60
Acknowledgements
This work has been supported by the FastFix project, which is funded by the 7th Framework Programme of the European Commission, grant agreement no. FP7-258109. We would like to thank Enrique Garcia Perez, Damir Ismailović, Amel Mahmuzić, Helmut Naughton, Tobias Roehm, Alex Waldmann, and the anonymous MSR’11 and EMSE reviewers for their valuable feedback. We are further thankful to Jonas Helming, Felix Kaser, and Daniel G. Siegel for helpful insights into the Eclipse and GNOME communities.
Author information
Authors and Affiliations
Corresponding authors
Additional information
Editors: Arie van Deursen, Tao Xie and Thomas Zimmermann
Rights and permissions
About this article
Cite this article
Pagano, D., Maalej, W. How do open source communities blog?. Empir Software Eng 18, 1090–1124 (2013). https://doi.org/10.1007/s10664-012-9211-2
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10664-012-9211-2