Elsevier

Journal of Informetrics

Volume 5, Issue 4, October 2011, Pages 608-617
Journal of Informetrics

Non-alphanumeric characters in titles of scientific publications: An analysis of their occurrence and correlation with citation impact

https://doi.org/10.1016/j.joi.2011.05.008Get rights and content

Abstract

We investigated the occurrence of non-alphanumeric characters in a randomized subset of over almost 650,000 titles of scientific publications from the Web of Science database. Additionally, for almost 500,000 of these publications we correlated occurrence with impact, using the field-normalised citation metric CPP/FCSm. We compared occurrence and correlation with impact both at in general and for specific disciplines and took into account the variation within sets by (non-parametrically) bootstrapping the calculation of impact values. We also compared use and impact of individual characters in the 30 fields in which non-alphanumeric characters occur most frequently, by using heatmaps that clustered and reordered fields and characters. We conclude that the use of some non-alphanumeric characters, such as the hyphen and colon, is common in most titles and that not including such characters generally correlates negatively with impact. Specific disciplines on the other hand, may show either a negative, absent, or positive correlation. We also found that thematically related science fields use non-alphanumeric characters in comparable numbers, but that impact associated with such characters shows a less strong thematic relation. Overall, it appears that authors cannot influence success of publications by including non-alphanumeric characters in fields where this is not already commonplace.

Highlights

► Over two-third of the titles of scientific publications have at least one non-alphanumeric character, mostly the hyphen, colon, comma, or parentheses. ► In general, non-alphanumeric publications are associated with larger citation impact, when compared with publications that have only alphanumeric characters in their titles, but this may differ in specific disciplines. ► The relative amount of non-alphanumeric characters does not increase the last 10 years, fcontrasting earlier results. ► Thematically related fields show comparable numbers of specific non-alphanumeric characters.

Introduction

Every day, the inbox of a modern researcher readily fills up with emails from friends, colleagues, and even complete strangers. Even more, at stated intervals, emails arrive that contain titles of interesting publications which have recently been added to databases such as Pubmed, Scopus, or the Web of Science. Furthermore, personal messages, electronic forums, web sites, and social networks all require attention and time. Evidently, new scientific literature is only one stream of information that nowadays flows towards a researcher—albeit a rather pivotal one for the profession at hand. Already some time ago, Meadows (1974) estimated that an average researcher had to scan through roughly 3000 titles per year. We assume that this has only become more, and that the increased information burden leaves even less time to deal with them. Clearly, to get attention of potential readers, it is crucial that a publication is presented effectively to a researcher. In many cases, the title is the way to accomplish this (Soler, 2007). Of course, an author could try a tactic employed by writers of certain emails BEGGING FOR attention. Yet, there is a good chance that this will annoy and subsequently put off potential readers, and since being read is an important factor in the professional success of authors, this is evidently not desirable. As writing and publishing is a communal effort, readers are used to certain topics and styles. Authors can use this to their benefit, by using familiar ways of phrasing a title in order to facilitate quick reading and to use signal words that are expected to trigger the interest of an audience. Yet, phrasing a title too general can bore: a title has to stand out too. Standing out can be accomplished by phrasing differently, for example by using a well-known (but within science not common) literary template such as “to X or not to X” (and filling at the X the particular topic of interest). Alternatively, it could be as simple as using particular, non-alphanumeric characters in a title.

Specific non-alphanumeric characters and title characteristics have been the subject of previous research. Early studies by Dillon, 1981, Dillon, 1982 showed that the colon (“:”) has become a standard character in titles of scientific publications. Lewison and Hartley (2005) also studied the colon and found differences in title length and colon usage, both over time and over disciplines. Hartley (2007) combined a meta-analysis with new results and showed that colons are preferred by students because they improve the structure of a title, but are not necessarily appreciated by their fellow academics, who make up the intended audience of most scientific publications. However, studies cited by Hartley (2007) failed to find significant differences between the number of citations for publications with and without colons in their title, although the scope of this result was limited to a single journal. Beside the colon, Ball (2009) showed that the question mark has become a frequently appearing in titles in Medicine and (to a lesser extend) in Physics. We generalize these previous studies on specific aspects of titles and investigate both use of specific characters in publication titles and correlation with impact in a broad and extensive sense. By this, we mean that we do not focus on a particular (non-alphanumeric) character nor limit our investigation to specific journals or science fields.

Our main research question is: given the importance of readership in the success of scientific publications, could something simple as using a particular type of character “boost” the success of a publication. Our hypothesis is that the effect of non-alphanumeric characters on the success of publications is constrained by conventions regarding readability and form. Consequently, if such characters occur and exhibit a positive correlation with the success of publications, those characters usually have a known function or are accepted elements. We investigate this by posing the following research questions. First, what non-alphanumeric characters exist in scientific publications? Then, can we see a difference in the success of publications with and without such characters? Also, are such effects global, or can we see differences over disciplines? Additionally, what is the effect of frequently occurring characters? Finally, how does the use and impact of characters compare over fields?

Section snippets

Method

To investigate non-alphanumeric characters in titles, we extracted publications from all research fields available in the Web of Science database1 (WoS) published in the period 1999–2008. However, the number of publications available in the WoS for that period is large (almost 13 million), which makes exhaustive analyses too time-consuming and we

Occurrence of non-alphanumeric characters

Our 5% random sample consisted of 642,807 WoS publications, all published between 1999 and 2008. Table 1 lists the 29 non-alphanumeric characters we encountered in the titles of these publications. Next to rank (#) and character (C), this table also shows the number of publications (N) associated with a character, as well as the percentage (%) relative to all publications (in the sample); a point estimate of the impact (I); and the number of publications (articles, letters, notes, reviews) used

Conclusions

We started this publication by pointing out that nowadays, there are many sources of information which require the attention of a scientific researcher. As a result, searching of new, potentially interesting scientific publications, has to compete with those other sources of information, that a researcher has to scan every day. We then continued to hypothesize that therefore, the title of a publication which wants to capture an audience, needs to strike a balance between conforming and

Acknowledgements

We kindly thank the anonymous reviewers for sharpening our arguments, as well as their suggestions for future research.

References (26)

  • J.T. Dillon

    The emergence of the colon: An empirical correlate of scholarship

    American Psychologist

    (1981)
  • J.T. Dillon

    In pursuit of the colon: A century of scholarly progress: 1880–1980

    The Journal of Higher Education

    (1982)
  • B. Efron et al.

    An introduction to the bootstrap

    (1993)
  • Cited by (60)

    • Motivation for downloading academic publications

      2023, Library and Information Science Research
    • Poincare: Recommending Publication Venues via Treatment Effect Estimation

      2022, Journal of Informetrics
      Citation Excerpt :

      Our approach can be seen as the next step of Dong et al. as they formulated the problem by a prediction problem, while we provide a concrete action via venue recommendation. The relationship between the citation pattern and the content of a paper has also been extensively studied (Buter and van Raan, 2011; Falagas et al., 2013; Subotic and Mukherjee, 2014; Vieira and Gomes, 2010). Overall, the existing methods forecast the number of citations or model the transition of the number of citations, and they do not recommend venues to maximize the impact.

    • Titles in research articles

      2022, Journal of English for Academic Purposes
      Citation Excerpt :

      So, longer titles are cited more often in many disciplines (Jacques & Sebire, 2010, pp. 2–3; van Wesel, Wyatt & ten Haaf, 2014), although this might not be the case in pure sciences (van Wesel et al., 2014). Titles that contain a colon may be more likely to attract citations (Buter & van Raan, 2011; Jacques & Sebire, 2010), while those with question marks had poorer citation rates (Paiva et al., 2012). In the largest study to date, Hudson (2016) examined the impact of multiple authorship on the titles of papers submitted to the UK's four-yearly Research Evaluation Framework (the REF) in 2014.

    View all citing articles on Scopus
    View full text