Skip to main content
Log in

How do open source communities blog?

  • Published:
Empirical Software Engineering Aims and scope Submit manuscript

Abstract

We report on an exploratory study, which aims at understanding how software communities use blogs compared to conventional development infrastructures. We analyzed the behavior of 1,100 bloggers in four large open source communities, distinguishing between committing bloggers and other community members. We observed that these communities intensively use blogs with one new entry every 8 h. A blog entry includes 14 times more words than a commit message. When analyzing the content of the blogs, we found that committers and others bloggers write about similar topics. Most popular topics in committers’ blogs represent high-level concepts such as features and domain concepts, while source code related topics are discussed in 15% of their posts. Other community members frequently write about community events and conferences as well as configuration and deployment topics. We found that the blogging peak period is usually after the software is released. Moreover, committers are more likely to blog after corrective engineering than after forward engineering and re-engineering activities. Our findings call for a hypothesis-driven research to (a) further understand the role of social media in dissolving the collaboration boundaries between developers and other stakeholders and (b) integrate social media into development processes and tools.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Notes

  1. http://www.facebook.com/press/info.php?statistics

  2. www.planetplanet.org

  3. http://www.catalysoft.com/articles/StrikeAMatch.html

  4. planeteclipse.org

  5. planet.gnome.org

  6. planet.postgresql.org and www.planetpostgresql.org

  7. planet.python.org

  8. In an earlier version (Pagano and Maalej 2011) we reported a much smaller rate. After a manual analysis of 100 randomly selected blog posts, we discovered that the original scanning algorithm missed certain blocks. We are more confident with the results presented in this paper.

References

  • Agrawal R, Srikant R (1995) Mining sequential patterns. In: Proceedings of the eleventh international conference on data engineering. IEEE, pp 3–14

  • Agrawal R, Imielinski T, Swami A (1993) Mining association rules between sets of items in large databases. In: Proceedings of the 1993 SIGMOD conference on management of data, ACM, Washington, DC, USA, pp 207–216

  • Bacchelli A, Lanza M, Robbes R (2010) Linking e-mails and source code artifacts. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering—ICSE ’10, p 375

  • Begel A, DeLine R, Zimmermann T (2010) Social media for software engineering. In: Proceedings of the FSE/SDP workshop on future of software engineering research. ACM, pp 33–38

  • Bettenburg N, Adams B, Hassan AE, Smidt M (2011) A lightweight approach to uncover technical artifacts in unstructured data. In: 2011 IEEE 19th international conference on program comprehension, pp 185–188

  • Bettenburg N, Premraj R, Zimmermann T, Kim S (2008) Extracting structural information from bug reports. In: Proceedings of the 2008 international workshop on mining software repositories—MSR ’08, p 27

  • Bird C, Gourley A, Devanbu P, Gertz M, Swaminathan A (2006) Mining email social networks. In: Proceedings of the 2006 international workshop on mining software repositories. ACM, pp 137–143

  • Blei DM, Ng AY, Jordan MI (2003) Latent dirichlet allocation. J Mach Learn Res 3(4–5):993–1022

    MATH  Google Scholar 

  • Crowston K, Heckman R, Annabi H, Masango C (2005) A structurational perspective on leadership in Free/libre open source software teams. In: Proceedings of the 1st conference on open source systems (OSS), Genova, Italy

  • Dice LR (1945) Measures of the amount of ecologic association between species. Ecology 26(3):297–302

    Article  Google Scholar 

  • Glance N, Hurst M, Tomokiyo T (2004) BlogPulse: Automated trend discovery for weblogs. In: Proceedings of the WWW 2004 workshop on the weblogging ecosystem: aggregation, analysis and dynamics, ACM, New York, NY, USA

  • Gruhl D, Liben-Nowell D, Guha R, Tomkins A (2004) Information diffusion through blogspace. ACM SIGKDD Explorations Newsletter 6(2):43–52

    Article  Google Scholar 

  • Guzzi A, Pinzger M, van Deursen A (2010) Combining micro-blogging and IDE interactions to support developers in their quests. In: Proceedings of the 26th international conference on software maintenance (ICSM), IEEE, 2010, pp 1–5

  • Hattori L, Lanza M (2008) On the nature of commits. In: ASE workshops. IEEE, pp 63–71

  • Kaplan AM, Haenlein M (2010) Users of the world, unite! The challenges and opportunities of social media. Bus Horiz 53(1):59–68

    Article  Google Scholar 

  • Maalej W, Happel H (2009) From work to word: how do software developers describe their work? In: Working conference on mining software repositories, pp 121–130

  • Maalej W, Happel H-J (2010) Can development work describe itself? In: 2010 7th IEEE working conference on mining software repositories (MSR 2010), pp 191–200

  • Maalej W, Pagano D (2011) On the socialness of software. In: Proceedings of the international conference on social computing and its applications. Sydney, Australia, IEEE

  • Maalej W, Panagiotou D, Happel H-J (2008) Towards effective management of software knowledge exploiting the semantic wiki paradigm. In: Herrmann K, Brügge B (eds) Software engineering. Bonn, Germany, GI, pp 183–197

  • Mockus A, Fielding RT, Herbsleb JD (2002) Two case studies of open source software development: Apache and Mozilla. ACM Trans Softw Eng Methodol 11(3):309–346

    Article  Google Scholar 

  • Pagano D, Maalej W (2011) How do developers blog? an exploratory study. In: Proceedings of the 8th conference on mining software repositories. ACM

  • Parnin C, Treude C (2011) Measuring API documentation on the web. In: Proceeding of the 2nd international workshop on web 2.0 for software engineering, Web2SE ’11. ACM, New York, NY, USA, pp 25–30

  • Song X, Chi Y, Hino K, Tseng B (2007) Identifying opinion leaders in the blogosphere. In: Proceedings of the sixteenth ACM conference on conference on information and knowledge management. ACM, New York, New York, USA, pp 971–974

  • The Nielsen Company (2010) Led by Facebook, Twitter, global time spent on social media sites up 82% year over year

  • Treude C, Storey M-A (2009) How tagging helps bridge the gap between social and technical aspects in software development. In: ICSE ’09: proceedings of the 2009 IEEE 31st international conference on software engineering. IEEE Computer Society, Washington, DC, USA, pp 12–22

  • Tseng B, Tatemura J, Wu Y (2005) Tomographic clustering to visualize blog communities as mountain views. In: WWW 2005 workshop on the weblogging ecosystem. Citeseer

  • van Deursen A, Mesbah A, Cornelissen B, Zaidman A, Pinzger M, Guzzi A (2010) Adinda : a knowledgeable , browser-based IDE. In: Proceedings of the 32nd ACM/IEEE international conference on software engineering. ACM, vol 2, pp 203–206

  • Wasserman S, Faust K (1994) Social network analysis: methods and applications. Cambridge University Press

  • Zaki M (2001) SPADE: an efficient algorithm for mining frequent sequences. Mach Learn 42(1): 31–60

    Article  MATH  Google Scholar 

Download references

Acknowledgements

This work has been supported by the FastFix project, which is funded by the 7th Framework Programme of the European Commission, grant agreement no. FP7-258109. We would like to thank Enrique Garcia Perez, Damir Ismailović, Amel Mahmuzić, Helmut Naughton, Tobias Roehm, Alex Waldmann, and the anonymous MSR’11 and EMSE reviewers for their valuable feedback. We are further thankful to Jonas Helming, Felix Kaser, and Daniel G. Siegel for helpful insights into the Eclipse and GNOME communities.

Author information

Authors and Affiliations

Authors

Corresponding authors

Correspondence to Dennis Pagano or Walid Maalej.

Additional information

Editors: Arie van Deursen, Tao Xie and Thomas Zimmermann

Rights and permissions

Reprints and permissions

About this article

Cite this article

Pagano, D., Maalej, W. How do open source communities blog?. Empir Software Eng 18, 1090–1124 (2013). https://doi.org/10.1007/s10664-012-9211-2

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10664-012-9211-2

Keywords

Navigation