Characterizing top ranked code examples in Google☆
Introduction
A code example is a snippet of reusable source code that illustrates how a programming problem can be solved (Keivanloo et al., 2014, Menezes et al., 2019). Code examples improve learning (Holmes et al., 2009, Robillard and Deline, 2011), support reuse (Holmes and Walker, 2012, Philip et al., 2012, Yang et al., 2017), and accelerate development (Spring Code Guides, 2021, Vincent, 2018). In practice, developers often rely on web search engines, such as Google, to find code examples (Gu et al., 2016, Hora, 2021b, Kim et al., 2010, Niu et al., 2017, Raghothaman et al., 2016, Sim et al., 2011, Sim et al., 2013, Stolee et al., 2014). Previous studies report that developers may spend up to 20% of their time looking for code examples on the web (Brandt et al., 2009, Niu et al., 2017, Philip et al., 2012). For instance, a popular programming website, W3Schools (W3Schools, 2021), has over 3 billions pageviews per year.1 Therefore, accessing good code examples is essential to software development in the current days (Nasehi et al., 2012).
The Google search engine indexes millions of webpages that include code examples (Treude and Aniche, 2018). Naturally, pages with better content are likely to be top ranked, grabbing more attention and click from the users (Google General Guidelines, 2021, How Search Algorithms Work, 2021, Treude and Aniche, 2018). In practice, many factors may influence the rank: page reputation, page domain, content quality, to name a few (Furnell and Evans, 2007, Google General Guidelines, 2021, Hannak et al., 2013, Kliman-Silver et al., 2015). However, these factors are inherently hard to enumerate and assess as the search engines do not reveal which particular ones they rely on when determining a website ranking (Furnell and Evans, 2007). For instance, prior work reports that there are over 200 different factors used by Google to calculate a page’s rank (Furnell and Evans, 2007). This way, the literature has examined several facets of web search engines to better understand how they work, to improve content discovery, or even to assess their fairness. For instance, techniques are proposed to audit black-box algorithms (Diakopoulos, 2014, Sandvig et al., 2014), empirical studies are performed to assess how personalization of web search may affect the results (Hannak et al., 2013, Kliman-Silver et al., 2015), to identify factors related to highly ranked webpages (Furnell and Evans, 2007), to assess partisanship of search results (Diakopoulos et al., 2018, Hu et al., 2019), and to analyze search snippets (Cutrell and Guan, 2007, Kaisser et al., 2008).
In practice, code example webpages are composed not only by code, but they are mixed with other elements, such as code explanations (see Fig. 1). Indeed, code examples are often enriched with textual description: as Google is a general web search engine, natural language is a solution to bypass the lack of expression inherent of programming language (Chen and Zhou, 2018, Gu et al., 2018, Hu et al., 2018, Nasehi et al., 2012, Yao et al., 2019). However, these factors may introduce a side effect: we are left unsure about the quality of the code examples themselves. For instance, a webpage with a poor code example could be top ranked due to its good textual description (we present concrete examples in Section 2). This way, it is important to understand how Google would rank code examples in isolation, i.e., without any other page elements. In this case, we could query and assess the characteristics of top/bottom ranked code examples and verify their quality aspects, for instance, whether good coding practices (Buse and Weimer, 2009, Martin, 2009, Moreno et al., 2015, Nasehi et al., 2012, Scalabrino et al., 2016, Scalabrino et al., 2018) are found in higher ranked ones. This may provide the basis to detect the possible strengths and limitations of the search engine in dealing with code. While previous studies propose dedicated code search engines (Bajracharya et al., 2006, Codota, 2021, Kim et al., 2010, krugle, 2021, McMillan et al., 2012, SearchCode, 2021) and techniques to rank code examples (Buse and Weimer, 2012, Gu et al., 2018, Hora, 2021a, Keivanloo et al., 2014, Moreno et al., 2015), to the best of our knowledge, no study assesses how Google – the de facto web search engine (Search Engine Market Share Worldwide, 2021) – deals with such content.
In this paper, we perform an empirical study to assess how the Google search engine ranks code examples. We analyze the characteristics of the top and bottom ranked code examples in return to code search queries. We focus on code examples that describe the usage of APIs, which are often the target of code search (Buse and Weimer, 2012, Parnin et al., 2012, Sadowski et al., 2015). For this purpose, we perform the following steps. First, we select 100 API methods from popular libraries and frameworks. Second, we collect from programming websites 1000 code examples about the selected APIs, including didactic and real software examples. Third, we build a website, host the code examples on webpages, and submit this website to the Google search engine. Lastly, after being fully indexed by Google, we query for APIs and assess the returned code examples in this controlled environment. Specifically, we investigate: (i) the rank of webpages with single and multiple code examples; (ii) the rank of webpages with didactic and real software code examples; (iii) the characteristics of top/bottom ranked code examples, such as their size, readability, reusability, and query similarity; and (iv) whether top ranked code examples can be predicted. We then propose the following research questions:
- •
RQ1 (single vs. multiple): How are single and multiple code examples ranked? We find that webpages with multiple code examples are more likely to be top ranked by Google than webpages with single examples. 82% of the webpages with multiple code examples are top ranked.
- •
RQ2 (didactic vs. real software): How are didactic and real software code examples ranked? Code examples created for didactic purposes are more likely to be higher ranked than code examples originated from real software systems. However, this is likely to happen because they have more API references and tokens density, not because they have better quality.
- •
RQ3 (top vs. bottom): What are the characteristics of top ranked code examples? Overall, top ranked code examples are larger and have more API references. We find that readable and reusable code examples are not necessarily top ranked.
- •
RQ4 (prediction and importance): To what extent can we predict that a code example will be top ranked? What are the most important characteristics? We can predict top ranked code examples with a good level of confidence (in the best case, precision: 79%, recall: 70%, and AUC: 89%). Generic factors (e.g., term frequency and size) are more important than code quality factors (e.g., reusability).
Based on our results, we provide insights to drive future research on code search. Moreover, we provide insights to improve the user experience of code example webpages, which is a practice encouraged by Google to benefit users and facilitate content discovery (Search Engine Optimization (SEO) Starter Guide, 2021).
Contributions. This study has three major contributions: (i) we provide the first empirical study to assess how the Google search engine ranks single/multiple and didactic/real software code examples (Sections 4.1 and 4.2); (ii) we study factors associated to top/bottom ranked code examples and investigate whether these factors can predict rank positions (Sections 4.3 RQ3: What are the characteristics of top ranked code examples?, 4.4 RQ4: To what extent can we predict that a code example will be top ranked? What are the most important characteristics?); and (iii) we provide guidelines to improve code example webpages and present insights to code search researchers (Section 5).
Structure of the paper. Section 2 presents a motivating example. Section 3 describes our study design. Section 4 presents our results and Section 5 discusses them. Section 6 states the threats to validity. Finally, Section 7 presents the related work and Section 8 concludes the paper.
Section snippets
Motivating examples
Developers often look for code examples on the web (Gu et al., 2016, Sim et al., 2011, Sim et al., 2013, Stolee et al., 2014). They are commonly interested in how to use APIs provided by libraries and frameworks (Buse and Weimer, 2012, Parnin et al., 2012, Sadowski et al., 2015). Typically, a code search query consists of API tokens, i.e., class and method names (Niu et al., 2017). For example, if a developer desires to retrieve code examples about an API, for instance, File.mkdirs,2
Study design
Fig. 3 presents an overview of the proposed approach to assess how the Google search engine ranks code examples. It includes five major steps: (1) selecting APIs, (2) collecting code examples, (3) indexing code examples, (4) querying code examples, and (5) assessing query results. We detail each step in the following subsections. Our results are publicly available.7
RQ1: How are single and multiple code examples ranked?
Fig. 6 presents the performance of the Google search engine for the metric FRank: 82% of the webpages with multiple code examples (i.e., the hits) are top ranked. That is, webpages with multiple code examples are more likely to be top ranked by the Google search engine than webpages with single examples.
Moreover, this ratio tend to be constant over time, as presented in Fig. 7. We find no major difference during the first 15 days of analysis in June 2019 (black bars): it started with 79% on the
Discussion and implications
Based on our results, we provide implications for practitioners and researchers. First, we present guidelines that can be applied by practitioners to improve code example webpages of programming websites. Then, we present insights to code search researchers.
Threats to validity
Generalization of the results. We assess the Google search engine and code examples implemented in Java. Google dominates the web, with more than 92% of the search market share (Search Engine Market Share Worldwide, 2021), while Java is among the most popular language nowadays. Despite these observations, our findings – as usual in empirical software engineering – cannot be directly generalized to other search engines (e.g., bing, Yahoo!, Baidu) nor to code examples written in other languages.
Related work
Several commercial code search tools exist in the market. Nowadays, it is possible to navigate in code search tools, such as SearchCode (SearchCode, 2021), ProgramCreek (ProgramCreek, 2021), and Krugle (krugle, 2021). Code is also easy to find in online version control platforms, such as GitHub. Over time, other code search tools were discontinued, such as codase (Codase, 2021) and OpenHub Code (OpenHub, 2021) (previously known as Koders and ohloh). The Google Code Search is perhaps the most
Conclusion
Code examples are often provided by programming websites to support software development. However, due to many factors found in webpages of programming websites (e.g., code explanations), code examples can be overshadowed by search engines. Thus, it is important to understand how code examples would be ranked in isolation, i.e., without any other page elements. In this case, we could query and assess the characteristics of top/bottom ranked code examples and verify their quality aspects. In
CRediT authorship contribution statement
Andre Hora: Conceptualization, Methodology, Data curation, Investigation, Software, Validation, Visualization, Writing - original draft, Writing - review and editing.
Declaration of Competing Interest
The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Acknowledgments
This research is supported by CAPES and CNPq .
Andre Hora is an Assistant Professor in the Computer Science Department at the Federal University of Minas Gerais (UFMG), Brazil. His research interests include software evolution, software repository mining, and empirical software engineering. He earned his Ph.D. in Computer Science from the University of Lille, France. Webpage: www.dcc.ufmg.br/ andrehora.
References (100)
- et al.
Predicting software defects with causality tests
J. Syst. Softw.
(2014) - et al.
Enriching documents with examples: A corpus mining approach
Trans. Inf. Syst.
(2013) Baeldung
(2021)- et al.
Mining search topics from a code search engine usage log
- et al.
Analyzing and mining a code search engine usage log
Empir. Softw. Eng.
(2012) - et al.
Sourcerer: a search engine for open source code supporting structure-based search
- et al.
The usage of web search for software engineering
(2019) - et al.
Example-centric programming: Integrating web search into the development environment
- et al.
Two studies of opportunistic programming: Interleaving web foraging, learning, and writing code
- et al.
Learning a metric for code readability
IEEE Trans. Softw. Eng.
(2009)
Synthesizing API usage examples
A neural framework for retrieval and summarization of source code
Codase
Codota
Creating a programmable search engine
What are you looking for?: an eye-tracking study of information usage in web search
Algorithmic accountability reporting: On the investigation of black boxes
I vote for? How search informs our choice of candidate
Untangling fine-grained code changes
Stack overflow considered harmful? the impact of copy&paste on android application security
Analysing google rankings through search engine optimization data
Internet Res.
Google code search google blog shutting down code search
Google general guidelines
Google search api
Exemplar: Executable examples archive
A search engine for finding highly relevant applications
Deep code search
Deep API learning
Measuring personalization of web search
The end-to-end use of source code examples: An exploratory study
Systematizing pragmatic software reuse
ACM Trans. Softw. Eng. Methodol.
Apisonar: Mining API usage examples
Softw. - Pract. Exp.
Googling for software development: What developers search for and what they find
Apiwave: Keeping track of API popularity and migration
How search algorithms work
Auditing the partisanship of google search snippets
Deep code comment generation
Introduction to indexing
The impact of correlated metrics on the interpretation of defect models
IEEE Trans. Softw. Eng.
Improving search results quality by customizing summary lengths
Spotting working code examples
Towards an intelligent code search engine
Classifying software changes: Clean or buggy?
IEEE Trans. Softw. Eng.
Location, location, location: The impact of geolocation on web search personalization
Krugle
Benchmarking classification models for software defect prediction: A proposed framework and novel findings
Trans. Softw. Eng.
How software engineers use documentation: The state of the practice
IEEE Softw.
Classification and regression by randomforest
R News
What are the characteristics of popular apis? A large scale study on java, android, and 165 libraries
Softw. Qual. J.
Cited by (9)
Assessing the Readability of ChatGPT Code Snippet Recommendations: A Comparative Study
2023, ACM International Conference Proceeding SeriesFrom Web Catalogs to Google: A Retrospective Study of Web Search Engines Sustainable Development
2023, Sustainability (Switzerland)How do Developers Improve Code Readability' An Empirical Study of Pull Requests
2023, Proceedings - 2023 IEEE International Conference on Software Maintenance and Evolution, ICSME 2023
Andre Hora is an Assistant Professor in the Computer Science Department at the Federal University of Minas Gerais (UFMG), Brazil. His research interests include software evolution, software repository mining, and empirical software engineering. He earned his Ph.D. in Computer Science from the University of Lille, France. Webpage: www.dcc.ufmg.br/ andrehora.
- ☆
Editor: Gabriele Bavota.