Buzzword detection in the scientific scenario☆
Introduction
All fields of research have topics which are the major focus of studies by their communities. Sometimes, such topics arouse interest gradually and eventually become the most discussed topic in a certain area, but at other times their appearance already indicates the timing of the main exploration of the subject. Knowing in advance which words will become buzzwords can help enterprises to make strategic decisions about which fields are promising and deserve more attention, thus dictating a possible pioneering position in certain areas of knowledge.
Buzzwords are new terms or phrases (neologisms) created in one language that acquire great popularity as fashionable words [22]. Informally, a buzzword is a word or phrase related to a specialized field or group at a particular time, or in a particular context, used mostly to impress lay persons. Through the use of these terms it is possible to identify the latest trends of what is happening around the world; that is, what is being most discussed by the population or the most interesting topics at that moment. Buzzword detection consists of important information, especially in the areas of marketing, business, politics, and intelligence [21], [30]. Therefore, it is very useful to identify these words as early as possible.
The difference between buzzwords and most of the new terms in a language is the exponential growth of buzzwords. It is a difficult task to predict whether a new word used by a community is destined to become a common term in dictionaries or if it is heading towards a tipping point from which it will decline. Neuman et al. [22] cite the example of the term “Web 2.0”, created in 2001 by Tim O’Reilly to describe a turning point for the Web. After a year and a half, the term had gained huge popularity, being quoted in Google more than 9.5 million times and, in 2009, the number of citations in Google had reached 422 million [22]. Currently, however, the term “Web 2.0” seems to have lost popularity.
The popularity of buzzwords comes from their use in media such as TV, magazines, newspapers, and social networks. However, there is usually a smaller group that uses these terms before they become popular with the masses. In other words, buzzwords emerge from a restricted community and gradually spread to other communities, to then become widely known among most people. By identifying this type of behavior, it is possible to find potential buzzwords.
The term “buzzword” has its historical use related to the language of the business and technology sector [20]. Some studies about it have been done, mainly in the blogosphere [7], [19], [21], [22], [30], or with a view to finding ways to model bursts of topics [4], [12], [24]. This is justified due to the fact that blogs are sources of information in which users can express their opinions and interests in real time and, thus, reflect the most current trends. It also provides an ideal place to study the dynamics of a language’s environment. Studies on buzzword detection evaluate the possibility of a topic becoming popular by considering its temporal variation in the text of blogs, which allows researchers to observe the emergence of new topics and the concentration of interests over time [30]. Thus, a common approach in detecting buzzwords in blogs is to evaluate the growth rate for the citation of a topic in such communities.
An unexplored field in which it would be interesting to study buzzwords is the scientific scenario. Since it is possible to see trends in the development of innovations in this scenario, there is also a high propensity for the emergence of buzzwords. According to [3], buzzwords frequently appear in the titles of conference papers and in comments and questions addressed to conference speakers. This occurs mainly because of the strong relationship between innovations and buzzwords [3]. Moreover, in the academic context, a buzzword can represent the interests of a community in relation to a particular subject, and the frequency at which a particular term is used by the academic community in scientific publications should be accompanied over the years.
The identification of new buzzwords in the scientific field may indicate the rise of a new research or business area. To detect the emergence of these words, we can conduct technological forecasting studies, in which it would be possible to predict the impacts of a given innovation. Early buzzword detection is an important contribution to the decision-making process and market trend analysis.
The organization of the rest of this paper is as follows: Section 2 explains how we obtained and prepared the corpus for the experiments; Section 3 describes the clustering experiments; Section 4 analyzes and compares the results; and, finally, the conclusions of this work are presented in Section 5.
Section snippets
DBLP
For the present study, the DBLP database [15] was used. The DBLP project is currently maintained by the Universität Trier, in Germany. This database consists of more than 2,947,000 documents of bibliographic information in the computer science area, including conference papers, journals, series, books, and even Master’s and Doctorate degree theses.
Preprocessing
The preprocessing stage included data cleaning, data transcription to a database, and the selection of articles for use in this work.
Formatting tags
Clustering
The initial analysis presented in this work was the clustering of words extracted from the titles of articles. The goal of this analysis was to identify the buzzwords by evaluating the behavior of the cluster in which they had been grouped. A word’s frequency over the years can help to detect buzzwords. Generally, the use of these terms increases greatly at the specific point in time that they are buzzwords; for example, the word “mapreduce” (Fig. 2).
The dispersion curve over the years does not
Discussion
An interesting fact we noticed during our experiments is that several words were identified as candidates for 2012’s buzzwords by both the k-means and SOM algorithms. By analyzing these potential buzzwords, we could see words like “pomdp” and “pomdps”. The frequency over the years of these words is shown in Fig. 6. Indeed, the dispersion of the terms “pomdp” and “pomdps” is quite similar to that which is expected from a buzzword.
After searching whether or not this term is a buzzword, we
Conclusion
The aim of this work was to study the annual occurrence of terms found in the titles of articles in order to identify buzzwords. Through comparative analysis of the methods used, it was found that the results generated in relation to clustering executions performed with the k-means and the SOM approaches were consistent. The dispersion curves generated for the centroids of the clusters containing potential buzzwords for each year are similar for both approaches and describe growth behavior that
Acknowledgments
We would like to acknowledge the financial support from CNPq and Capes.
References (31)
- et al.
Automated handwashing assistance for persons with dementia using video and a partially observable Markov decision process
Comput. Vis. Image Underst.
(2010) Dblp: Some lessons learned
Proc. VLDB Endow.
(2009)- et al.
Auto-weka: Combined selection and hyperparameter optimization of classification algorithms
Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining
(2013) Detecting buzz from time-sequenced document streams
Proceedings of the 2005 IEEE International Conference on e-Technology, e-Commerce and e-Service, 2005. EEE’05.
(2005)- et al.
Autonomous safety decision-making in intelligent robotic systems in the uncertain environments
Proceedings of Annual Meeting of the North American Fuzzy Information Processing Society, 2008. NAFIPS 2008
(2008) - et al.
An improvement in k-mean clustering algorithm using better time and accuracy
Int. J. Program. Lang. Appl.
(2013) - et al.
Handbook of Research on Small Business and Entrepreneurship
(2014) - et al.
Finding bursty topics from microblogs
Proceedings of the 50th Annual Meeting of the Association for Computational Linguistics: Long Papers-Volume 1
(2012) Prospecção tecnológica em materiais: aumento da eficiência do tratamento bibliometrico: aplicação na análise de tratamentos de superfície resistentes ao desgaste. 2001. 213 f
(2001)- et al.
Biowekaextending the weka framework for bioinformatics
Bioinformatics
(2007)
Blogpulse: Automated trend discovery for weblogs
Proceedings of the WWW 2004 workshop on the weblogging ecosystem: Aggregation, analysis and dynamics
Bibliometria: uma ferramenta estatística para a gestão da informação e do conhecimento, em sistemas de informação, de comunicação e de avaliação científica e tecnológica
Encontro Nacional de Ciência da Informação
The weka data mining software: An update
SIGKDD Explor. Newsl.
An index to quantify an individual’s scientific research output
Bursty and hierarchical structure in streams
Data Min. Knowl. Discov.
Cited by (3)
Feature selection using hybrid poor and rich optimization algorithm for text classification
2021, Pattern Recognition LettersArtificial intelligence trend analysis in German business and politics: a web mining approach
2023, International Journal of Data Science and Analytics
- ☆
This paper has been recommended for acceptance by Jie Zou.