Characterizing top ranked code examples in Google

https://doi.org/10.1016/j.jss.2021.110971Get rights and content

Highlights

  • Pages with multiple code examples are more likely to be top ranked by Google.

  • Top ranked code examples are larger and have more API references.

  • Readable and reusable code examples are not necessarily top ranked.

  • Top ranked code examples can be predicted with good level of confidence.

Abstract

Developers often look for code examples on the web to improve learning and accelerate development. Google indexes millions of pages with code examples: pages with better content are likely to be top ranked. In practice, many factors may influence the rank: page reputation, content quality, etc. Consequently, the most relevant information on the page, i.e., the code example, may be overshadowed by the search engine. Thus, a better understanding of how Google would rank code examples in isolation may provide the basis to detect its strengths and limitations on dealing with such content. In this paper, we assess how the Google search engine ranks code examples. We build a website with 1,000 examples and submit it to Google. After being fully indexed, we query and analyze the returned examples. We find that pages with multiple code examples are more likely to top ranked by Google. Overall, single code examples that are higher ranked are larger, however, they are not necessarily more readable and reusable. We predict top ranked examples with a good level of confidence, but generic factors have more importance than code quality ones. Based on our results, we provide insights for researchers and practitioners.

Introduction

A code example is a snippet of reusable source code that illustrates how a programming problem can be solved (Keivanloo et al., 2014, Menezes et al., 2019). Code examples improve learning (Holmes et al., 2009, Robillard and Deline, 2011), support reuse (Holmes and Walker, 2012, Philip et al., 2012, Yang et al., 2017), and accelerate development (Spring Code Guides, 2021, Vincent, 2018). In practice, developers often rely on web search engines, such as Google, to find code examples (Gu et al., 2016, Hora, 2021b, Kim et al., 2010, Niu et al., 2017, Raghothaman et al., 2016, Sim et al., 2011, Sim et al., 2013, Stolee et al., 2014). Previous studies report that developers may spend up to 20% of their time looking for code examples on the web (Brandt et al., 2009, Niu et al., 2017, Philip et al., 2012). For instance, a popular programming website, W3Schools (W3Schools, 2021), has over 3 billions pageviews per year.1 Therefore, accessing good code examples is essential to software development in the current days (Nasehi et al., 2012).

The Google search engine indexes millions of webpages that include code examples (Treude and Aniche, 2018). Naturally, pages with better content are likely to be top ranked, grabbing more attention and click from the users (Google General Guidelines, 2021, How Search Algorithms Work, 2021, Treude and Aniche, 2018). In practice, many factors may influence the rank: page reputation, page domain, content quality, to name a few (Furnell and Evans, 2007, Google General Guidelines, 2021, Hannak et al., 2013, Kliman-Silver et al., 2015). However, these factors are inherently hard to enumerate and assess as the search engines do not reveal which particular ones they rely on when determining a website ranking (Furnell and Evans, 2007). For instance, prior work reports that there are over 200 different factors used by Google to calculate a page’s rank (Furnell and Evans, 2007). This way, the literature has examined several facets of web search engines to better understand how they work, to improve content discovery, or even to assess their fairness. For instance, techniques are proposed to audit black-box algorithms (Diakopoulos, 2014, Sandvig et al., 2014), empirical studies are performed to assess how personalization of web search may affect the results (Hannak et al., 2013, Kliman-Silver et al., 2015), to identify factors related to highly ranked webpages (Furnell and Evans, 2007), to assess partisanship of search results (Diakopoulos et al., 2018, Hu et al., 2019), and to analyze search snippets (Cutrell and Guan, 2007, Kaisser et al., 2008).

In practice, code example webpages are composed not only by code, but they are mixed with other elements, such as code explanations (see Fig. 1). Indeed, code examples are often enriched with textual description: as Google is a general web search engine, natural language is a solution to bypass the lack of expression inherent of programming language (Chen and Zhou, 2018, Gu et al., 2018, Hu et al., 2018, Nasehi et al., 2012, Yao et al., 2019). However, these factors may introduce a side effect: we are left unsure about the quality of the code examples themselves. For instance, a webpage with a poor code example could be top ranked due to its good textual description (we present concrete examples in Section 2). This way, it is important to understand how Google would rank code examples in isolation, i.e., without any other page elements. In this case, we could query and assess the characteristics of top/bottom ranked code examples and verify their quality aspects, for instance, whether good coding practices (Buse and Weimer, 2009, Martin, 2009, Moreno et al., 2015, Nasehi et al., 2012, Scalabrino et al., 2016, Scalabrino et al., 2018) are found in higher ranked ones. This may provide the basis to detect the possible strengths and limitations of the search engine in dealing with code. While previous studies propose dedicated code search engines (Bajracharya et al., 2006, Codota, 2021, Kim et al., 2010, krugle, 2021, McMillan et al., 2012, SearchCode, 2021) and techniques to rank code examples (Buse and Weimer, 2012, Gu et al., 2018, Hora, 2021a, Keivanloo et al., 2014, Moreno et al., 2015), to the best of our knowledge, no study assesses how Google – the de facto web search engine (Search Engine Market Share Worldwide, 2021) – deals with such content.

In this paper, we perform an empirical study to assess how the Google search engine ranks code examples. We analyze the characteristics of the top and bottom ranked code examples in return to code search queries. We focus on code examples that describe the usage of APIs, which are often the target of code search (Buse and Weimer, 2012, Parnin et al., 2012, Sadowski et al., 2015). For this purpose, we perform the following steps. First, we select 100 API methods from popular libraries and frameworks. Second, we collect from programming websites 1000 code examples about the selected APIs, including didactic and real software examples. Third, we build a website, host the code examples on webpages, and submit this website to the Google search engine. Lastly, after being fully indexed by Google, we query for APIs and assess the returned code examples in this controlled environment. Specifically, we investigate: (i) the rank of webpages with single and multiple code examples; (ii) the rank of webpages with didactic and real software code examples; (iii) the characteristics of top/bottom ranked code examples, such as their size, readability, reusability, and query similarity; and (iv) whether top ranked code examples can be predicted. We then propose the following research questions:

  • RQ1 (single vs. multiple): How are single and multiple code examples ranked? We find that webpages with multiple code examples are more likely to be top ranked by Google than webpages with single examples. 82% of the webpages with multiple code examples are top ranked.

  • RQ2 (didactic vs. real software): How are didactic and real software code examples ranked? Code examples created for didactic purposes are more likely to be higher ranked than code examples originated from real software systems. However, this is likely to happen because they have more API references and tokens density, not because they have better quality.

  • RQ3 (top vs. bottom): What are the characteristics of top ranked code examples? Overall, top ranked code examples are larger and have more API references. We find that readable and reusable code examples are not necessarily top ranked.

  • RQ4 (prediction and importance): To what extent can we predict that a code example will be top ranked? What are the most important characteristics? We can predict top ranked code examples with a good level of confidence (in the best case, precision: 79%, recall: 70%, and AUC: 89%). Generic factors (e.g., term frequency and size) are more important than code quality factors (e.g., reusability).

Based on our results, we provide insights to drive future research on code search. Moreover, we provide insights to improve the user experience of code example webpages, which is a practice encouraged by Google to benefit users and facilitate content discovery (Search Engine Optimization (SEO) Starter Guide, 2021).

Contributions. This study has three major contributions: (i) we provide the first empirical study to assess how the Google search engine ranks single/multiple and didactic/real software code examples (Sections 4.1 and 4.2); (ii) we study factors associated to top/bottom ranked code examples and investigate whether these factors can predict rank positions (Sections 4.3 RQ3: What are the characteristics of top ranked code examples?, 4.4 RQ4: To what extent can we predict that a code example will be top ranked? What are the most important characteristics?); and (iii) we provide guidelines to improve code example webpages and present insights to code search researchers (Section 5).

Structure of the paper. Section 2 presents a motivating example. Section 3 describes our study design. Section 4 presents our results and Section 5 discusses them. Section 6 states the threats to validity. Finally, Section 7 presents the related work and Section 8 concludes the paper.

Section snippets

Motivating examples

Developers often look for code examples on the web (Gu et al., 2016, Sim et al., 2011, Sim et al., 2013, Stolee et al., 2014). They are commonly interested in how to use APIs provided by libraries and frameworks (Buse and Weimer, 2012, Parnin et al., 2012, Sadowski et al., 2015). Typically, a code search query consists of API tokens, i.e., class and method names (Niu et al., 2017). For example, if a developer desires to retrieve code examples about an API, for instance, File.mkdirs,2

Study design

Fig. 3 presents an overview of the proposed approach to assess how the Google search engine ranks code examples. It includes five major steps: (1) selecting APIs, (2) collecting code examples, (3) indexing code examples, (4) querying code examples, and (5) assessing query results. We detail each step in the following subsections. Our results are publicly available.7

RQ1: How are single and multiple code examples ranked?

Fig. 6 presents the performance of the Google search engine for the metric FRank: 82% of the webpages with multiple code examples (i.e., the hits) are top ranked. That is, webpages with multiple code examples are more likely to be top ranked by the Google search engine than webpages with single examples.

Moreover, this ratio tend to be constant over time, as presented in Fig. 7. We find no major difference during the first 15 days of analysis in June 2019 (black bars): it started with 79% on the

Discussion and implications

Based on our results, we provide implications for practitioners and researchers. First, we present guidelines that can be applied by practitioners to improve code example webpages of programming websites. Then, we present insights to code search researchers.

Threats to validity

Generalization of the results. We assess the Google search engine and code examples implemented in Java. Google dominates the web, with more than 92% of the search market share (Search Engine Market Share Worldwide, 2021), while Java is among the most popular language nowadays. Despite these observations, our findings – as usual in empirical software engineering – cannot be directly generalized to other search engines (e.g., bing, Yahoo!, Baidu) nor to code examples written in other languages.

Related work

Several commercial code search tools exist in the market. Nowadays, it is possible to navigate in code search tools, such as SearchCode (SearchCode, 2021), ProgramCreek (ProgramCreek, 2021), and Krugle (krugle, 2021). Code is also easy to find in online version control platforms, such as GitHub. Over time, other code search tools were discontinued, such as codase (Codase, 2021) and OpenHub Code (OpenHub, 2021) (previously known as Koders and ohloh). The Google Code Search is perhaps the most

Conclusion

Code examples are often provided by programming websites to support software development. However, due to many factors found in webpages of programming websites (e.g., code explanations), code examples can be overshadowed by search engines. Thus, it is important to understand how code examples would be ranked in isolation, i.e., without any other page elements. In this case, we could query and assess the characteristics of top/bottom ranked code examples and verify their quality aspects. In

CRediT authorship contribution statement

Andre Hora: Conceptualization, Methodology, Data curation, Investigation, Software, Validation, Visualization, Writing - original draft, Writing - review and editing.

Declaration of Competing Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Acknowledgments

This research is supported by CAPES and CNPq .

Andre Hora is an Assistant Professor in the Computer Science Department at the Federal University of Minas Gerais (UFMG), Brazil. His research interests include software evolution, software repository mining, and empirical software engineering. He earned his Ph.D. in Computer Science from the University of Lille, France. Webpage: www.dcc.ufmg.br/ andrehora.

References (100)

  • CoutoC. et al.

    Predicting software defects with causality tests

    J. Syst. Softw.

    (2014)
  • KimJ. et al.

    Enriching documents with examples: A corpus mining approach

    Trans. Inf. Syst.

    (2013)
  • Baeldung

    (2021)
  • BajracharyaS. et al.

    Mining search topics from a code search engine usage log

  • BajracharyaS.K. et al.

    Analyzing and mining a code search engine usage log

    Empir. Softw. Eng.

    (2012)
  • BajracharyaS. et al.

    Sourcerer: a search engine for open source code supporting structure-based search

  • BansalC. et al.

    The usage of web search for software engineering

    (2019)
  • BrandtJ. et al.

    Example-centric programming: Integrating web search into the development environment

  • BrandtJ. et al.

    Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code

  • BuseR.P. et al.

    Learning a metric for code readability

    IEEE Trans. Softw. Eng.

    (2009)
  • BuseR.P. et al.

    Synthesizing API usage examples

  • ChenQ. et al.

    A neural framework for retrieval and summarization of source code

  • Codase

    (2021)
  • Codota

    (2021)
  • Creating a programmable search engine

    (2021)
  • CutrellE. et al.

    What are you looking for?: an eye-tracking study of information usage in web search

  • DiakopoulosN.

    Algorithmic accountability reporting: On the investigation of black boxes

    (2014)
  • DiakopoulosN. et al.

    I vote for? How search informs our choice of candidate

  • DiasM. et al.

    Untangling fine-grained code changes

  • FischerF. et al.

    Stack overflow considered harmful? the impact of copy&paste on android application security

  • FurnellS. et al.

    Analysing google rankings through search engine optimization data

    Internet Res.

    (2007)
  • (2021)
  • Google code search google blog shutting down code search

    (2021)
  • Google general guidelines

    (2021)
  • Google search api

    (2021)
  • GrechanikM. et al.

    Exemplar: Executable examples archive

  • GrechanikM. et al.

    A search engine for finding highly relevant applications

  • GuX. et al.

    Deep code search

  • GuX. et al.

    Deep API learning

  • HannakA. et al.

    Measuring personalization of web search

  • HolmesR. et al.

    The end-to-end use of source code examples: An exploratory study

  • HolmesR. et al.

    Systematizing pragmatic software reuse

    ACM Trans. Softw. Eng. Methodol.

    (2012)
  • HoraA.

    Apisonar: Mining API usage examples

    Softw. - Pract. Exp.

    (2021)
  • HoraA.

    Googling for software development: What developers search for and what they find

  • HoraA. et al.

    Apiwave: Keeping track of API popularity and migration

  • How search algorithms work

    (2021)
  • HuD. et al.

    Auditing the partisanship of google search snippets

  • HuX. et al.

    Deep code comment generation

  • Introduction to indexing

    (2021)
  • JiarpakdeeJ. et al.

    The impact of correlated metrics on the interpretation of defect models

    IEEE Trans. Softw. Eng.

    (2019)
  • KaisserM. et al.

    Improving search results quality by customizing summary lengths

  • KeivanlooI. et al.

    Spotting working code examples

  • KimJ. et al.

    Towards an intelligent code search engine

  • KimS. et al.

    Classifying software changes: Clean or buggy?

    IEEE Trans. Softw. Eng.

    (2008)
  • Kliman-SilverC. et al.

    Location, location, location: The impact of geolocation on web search personalization

  • Krugle

    (2021)
  • LessmannS. et al.

    Benchmarking classification models for software defect prediction: A proposed framework and novel findings

    Trans. Softw. Eng.

    (2008)
  • LethbridgeT.C. et al.

    How software engineers use documentation: The state of the practice

    IEEE Softw.

    (2003)
  • LiawA. et al.

    Classification and regression by randomforest

    R News

    (2002)
  • LimaC. et al.

    What are the characteristics of popular apis? A large scale study on java, android, and 165 libraries

    Softw. Qual. J.

    (2020)
  • Andre Hora is an Assistant Professor in the Computer Science Department at the Federal University of Minas Gerais (UFMG), Brazil. His research interests include software evolution, software repository mining, and empirical software engineering. He earned his Ph.D. in Computer Science from the University of Lille, France. Webpage: www.dcc.ufmg.br/ andrehora.

    Editor: Gabriele Bavota.

    View full text