Abstract
In this paper we analyze topic evolution over time within bioinformatics to uncover the underlying dynamics of that field, focusing on the recent developments in the 2000s. We select 33 bioinformatics related conferences indexed in DBLP from 2000 to 2011. The major reason for choosing DBLP as the data source instead of PubMed is that DBLP retains most bioinformatics related conferences, and to study dynamics of the field, conference papers are more suitable than journal papers. We divide a period of a dozen years into four periods: period 1 (2000–2002), period 2 (2003–2005), period 3 (2006–2008) and period 4 (2009–2011). To conduct topic evolution analysis, we employ three major procedures, and for each procedure, we develop the following novel technique: the Markov Random Field-based topic clustering, automatic cluster labeling, and topic similarity based on Within-Period Cluster Similarity and Between-Period Cluster Similarity. The experimental results show that there are distinct topic transition patterns between different time periods. From period 1 to period 3, new topics seem to have emerged and expanded, whereas from period 3 to period 4, topics are merged and display more rigorous interaction with each other. This trend is confirmed by the collaboration pattern over time.















Similar content being viewed by others
References
Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, Y. (1998). Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (pp. 194–218).
AlSumait, L., Barbará, D., & Domeniconi, C. (2008). On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In ICDM’08: Proceeding of the 2008 Eighth IEEE International Conference on Data Mining, (pp. 3–12). IEEE Computer Society.
Åström, F. (2007). Changes in the LIS research front: Time-sliced cocitation analyses of LIS journal articles, 1990–2004. Journal of the American Society for Information Science and Technology, 58(7), 947–957.
Banerjee, A., Basu, S., & Merugu, S. (2007). A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In Proceedings of the SIAM International Conference on Data Mining (SDM-2007) (pp. 1919–1986).
Bansard, J. Y., Rebholz-Schuhmann, D., Cameron, G., Clark, D., Van Mulligen, E., Beltrame, F., et al. (2007). Medical informatics and bioinformatics: A bibliometric study. IEEE Transactions on Information Technology in Biomedicine, 11(3), 237–243.
Baxevanis, A. D., & Ouellette, B. F. (2004). Bioinformatics: A practical guide to the analysis of genes & proteins. New York: Wiley.
Bekkerman, R., Sahami, M., & Learned-Miller, E. (2006). Combinatorial Markov random fields. In ECML’06 Proceedings of the 17th European Conference on Machine Learning (pp. 30–41).
Besag, J. (1974). Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical Society, 36(2), 192–236.
Biryukov, M., & Dong, C. (2010). Analysis of computer science communities based on DBLP. In Research and advanced technology for digital libraries (pp. 228-235). Berlin, Heidelberg: Springer.
Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022
Chang, Y. W., & Huang, M. H. (2012). A study of the evolution of interdisciplinarity in library and information science: Using three bibliometric methods. Journal of the American Society for Information Science and Technology, 63(1), 22–33.
Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.
Chen, K., & Guan, J. (2011). A bibliometric investigation of research performance in emerging nanobiopharmaceuticals. Journal of Informetrics, 5(2), 233–247.
Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146–166.
Dhillon, I. S., Mallela, S., & Modha, D. S. (2003). Information-theoretic co-clustering. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 89–98). New York: ACM.
Diaz, F. (2009). Integration of news content into web results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (pp. 182–191). New York: ACM.
Ding, Y., Chowdhury, G. G., & Foo, S. (2001). Bibliometric cartography of information retrieval research by using co-word analysis. Information Processing and Management, 37(6), 817–842.
Fukumoto, F., & Suzuki, Y. (2000, July). Event tracking based on domain dependency. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 57–64). New York: ACM.
Glänzel, W., Janssens, F., & Thijs, B. (2009). A comparative analysis of publication activity and citation impact based on the core literature in bioinformatics. Scientometrics, 79(1), 109–129.
Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(suppl. 1), 5228–5235.
Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. San Diego, CA: Academic Press.
Huang, H., Andrews, J., & Tang, J. (2012). Citation characterization and impact normalization in bioinformatics journals. Journal of the American Society of Information Science and Technology, 63(3), 490–497.
Huang, Z., Yan, Y., Qiu, Y., & Qiao, S. (2009). Exploring emergent semantic communities from DBLP bibliography database. In ASONAM’09: Proceeding of the 2009 International Conference on Advances in Social Network Analysis and Mining. (pp. 219–224). IEEE Computer Society.
Jain, E. (2002). Current trends in bioinformatics. Trends in Biotechnology, 20(8), 317–319.
Janssens, F., Glänzel, W., & De Moor, B. (2007). Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 360–369). New York: ACM.
Jin, Y., Myaeng, S. H., & Jung, Y. (2007). Use of place information for improved event tracking. Information Processing and Management, 43(2), 365–378.
Jo, Y., Lagoze, C., & Giles, C. L. (2007). Detecting research topics via the correlation between graphs and texts. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 370–379). New York: ACM.
Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86.
Kuo, J. J., & Chen, H. H. (2007). Cross-document event clustering using knowledge mining from co-reference chains. Information Processing and Management, 43(2), 327–343.
Larivière, V., Sugimoto, C. R., & Cronin, B. (2012). A bibliometric chronicling of library and information science’s first hundred years. Journal of the American Society for Information Science and Technology, 63(5), 997–1016.
Larson, M., Soleymani, M., Serdyukov, P., Rudinac, S., Wartena, C., Murdock, V., et al. (2011, April). Automatic tagging and geotagging in video collections and communities. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval (p. 51). New York: ACM.
Lesk, A. M. (2008). Introduction to bioinformatics. Oxford: Oxford University Press.
Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 6(707–710), 1966.
Makkonen, J., Ahonen-Myka, H., & Marko Salmenkivi, M. (2004). Simple semantics in topic detection and tracking. Information Retrieval, 7(3–4), 347–368.
Manoharan, A. Kanagavel, B., Muthuchidambaram, A., & Kumaravel, J. P. S. (2011). Bioinformatics research: An informetric view. In Proceedings of the 2011 International Conference on Information Communication and Management (pp. 199–204). Singapore: IACSIT Press.
Mei, Q., Shen, X., & Zhai, C. (2007). Automatic labeling of multinomial topic models. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 490–499).
Mei, Q., & Zhai, C. (2005). Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 198–207). New York: ACM.
Mimno, D. M., & McCallum, A. (2008). Topic models conditioned on arbitrary features with Dirichlet-multinomial regression. UAI, 2008, 411–418.
Molatudi, M., Molotja, N., & Pouris, A. (2009). A bibliometric study of bioinformatics research in South Africa. Scientometrics, 81(1), 47–59.
Molidor, R., Sturn, A., Maurer, M., & Trajanoski, Z. (2003). New trends in bioinformatics: From genome sequence to personalized medicine. Experimental Gerontology, 38(10), 1031–1036.
Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 811–816). New York: ACM.
Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical report, University of Toronto
Patra, S. K., & Mishra, S. (2006). Bibliometric study of bioinformatics literature. Scientometrics, 67(3), 477–489.
Perez-Iratxeta, C., Andrade-Navarro, M. A., & Wren, J. D. (2007). Evolving research trends in bioinformatics. Briefings in Bioinformatics, 8(2), 88–95.
Rajaraman, K., & Tan, A. H. (2001). Topic detection, tracking, and trend analysis using self-organizing neural networks. Advances in Knowledge Discovery and Data Mining (pp. 102–107).
Reitz, F., & Hoffmann, O. (2010). An analysis of the evolving coverage of computer science sub-fields in the DBLP digital library. In Research and Advanced Technology for Digital Libraries (pp. 216–227). Berlin, Heidelberg: Springer.
Schult, R., & Spiliopoulou, M. (2006). Discovering emerging topics in unlabelled text collections. Lecture Notes in Computer Science (Vol. 4152, pp. 353–366). Berlin: Springer
Slonim, N., Friedman, N., & Tishby, N. (2002). Unsupervised document classification using sequential information maximization. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 129–136).
Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595–610.
Song, M., Kim, S. Y., Zhang, G., Ding, Y., & Chambers, T. (2014) Productivity and influence in bioinformatics: A bibliometric analysis. Journal of the American Society for Information Science and Technology, 65(2), 352–371.
Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., & Schult, R. (2006). Monic: Modeling and monitoring cluster transitions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery And Data Mining (pp. 706–711). New York: ACM.
Swan, R., & Allan, J. (2000). Automatic generation of overview timelines. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 49–56). New York: ACM.
Tu, Y. N., & Seng, J. L. (2012). Indices of novelty for emerging topic detection. Information Processing and Management, 48(2), 303–325.
Wang, C., Blei, D., & Heckerman, D. (2008). Continuous time dynamic topic models. In the 23rd Conference on Uncertainty in Artificial Intelligence.
Wang, X., & McCallum, A. (2006). Topics over time: a non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 424–433). New York: ACM.
White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science, 49(4), 327–355.
Wu, H., Wang, M., Feng J., & Pei, Y. (2010). Research topic evolution in “bioinformatics”. In 2010 4th International Conference on Bioinformatics and Biomedical Engineering(iCBBE). IEEE computer society.
Yang, C., Tang, X., Kim, S. Y., & Song, M. (2012) A trend analysis of domain-specific literatures with content and co-author network similarity. In The 14th International Conference on Asia-Pacific Digital Libraries (ICADL 2012)
Zhao, D., & Strotmann, A. (2008). Evolution of research activities and intellectual influences in information science 1996–2005: Introducing author bibliographic-coupling analysis. Journal of the American Society for Information Science and Technology, 59(13), 2070–2086.
Acknowledgments
This work was supported by National Research Foundation of Korea Grant funded by the Korean Government (NRF-2012-2012S1A3A2033291) and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. 2012033242).
Author information
Authors and Affiliations
Corresponding author
Appendices
Appendix 1: The list of conference full names
Conference name | Conference full name |
---|---|
APBC | Asia–Pacific Bioinformatics Conference |
AVBPA | Audio- and Video-Based Biometric Person Authentication |
BIBE | Bioinformatics and Bioengineering |
BIBM | Bioinformatics and Biomedicine |
BIDM | Biological Data Management |
B-interface | Bio-inspired Human–Machine Interfaces and Healthcare Applications |
BIOCOMP | Bioinformatics & Computational Biology |
BIODEVICES | Biomedical Electronics and Devices |
BIOINFORMATICS | International Conference on Bioinformatics |
BIOSIG | Biometrics and Electronic Signatures |
BIOSIGNALS | Bio-inspired Systems and Signal Processing |
BIOSTEC | Biomedical Engineering Systems and Technologies |
BMEI | BioMedical Engineering and Informatics |
BSBT | Bio-Science and Bio-Technology |
CIBCB | Computational Intelligence in Bioinformatics and Computational Biology |
CMSB | Computational Methods in Systems Biology |
CSB | Computational Systems Bioinformatics |
DNA Computing | DNA Computing |
ECCB | European Conference on Computational Biology |
EuroGP | European Conference on Genetic Programming |
EvoBIO | Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics |
FOGA | Foundations of Genetic Algorithms |
GCB | German Conference on Bioinformatics |
GECCO | Genetic and Evolutionary Computation Conference |
ISBI | International Symposium on Biomedical Imaging |
ISBRA | International Symposium on Bioinformatics Research and Applications |
ISMB | Intelligent Systems in Molecular Biology |
PRIB | Pattern Recognition in Bioinformatics |
PSB | Pacific Symposium on Biocomputing |
RECOMB | Research in Computational Molecular Biology |
WABI | Workshop on Algorithms in Bioinformatics |
WBIR | Workshop on Biomedical Image Registration |
WOB | Brazilian Workshop on Bioinformatics |
Appendix 2: Cluster similarity between clusters by BPCS for the “same” relation
Period | Similarity | |
---|---|---|
1 | 2 | Value |
Cluster 11 | Cluster 8 | 0.797 |
Cluster 17 | Cluster 0 | 0.778 |
Cluster 7 | Cluster 10 | 0.770 |
Cluster 17 | Cluster 8 | 0.764 |
Cluster 3 | Cluster 6 | 0.763 |
Cluster 17 | Cluster 14 | 0.756 |
Cluster 14 | Cluster 14 | 0.750 |
Cluster 6 | Cluster 8 | 0.748 |
Cluster 4 | Cluster 7 | 0.747 |
Cluster 9 | Cluster 5 | 0.747 |
Cluster 17 | Cluster 6 | 0.745 |
Cluster 7 | Cluster 8 | 0.736 |
Cluster 7 | Cluster 14 | 0.735 |
2 | 3 | Value |
---|---|---|
Cluster 8 | Cluster 10 | 0.848 |
Cluster 0 | Cluster 2 | 0.836 |
Cluster 3 | Cluster 16 | 0.820 |
Cluster 4 | Cluster 10 | 0.815 |
Cluster 14 | Cluster 8 | 0.813 |
Cluster 12 | Cluster 17 | 0.801 |
Cluster 1 | Cluster 0 | 0.800 |
Cluster 12 | Cluster 1 | 0.799 |
Cluster 12 | Cluster 8 | 0.794 |
Cluster 3 | Cluster 9 | 0.793 |
Cluster 7 | Cluster 9 | 0.789 |
Cluster 15 | Cluster 9 | 0.785 |
Cluster 1 | Cluster 19 | 0.784 |
Cluster 14 | Cluster 14 | 0.784 |
3 | 4 | Value |
---|---|---|
Cluster 8 | Cluster 3 | 0.846 |
Cluster 5 | Cluster 17 | 0.833 |
Cluster 2 | Cluster 0 | 0.830 |
Cluster 4 | Cluster 17 | 0.829 |
Cluster 12 | Cluster 8 | 0.824 |
Cluster 19 | Cluster 14 | 0.823 |
Cluster 10 | Cluster 14 | 0.823 |
Cluster 15 | Cluster 3 | 0.822 |
Cluster 10 | Cluster 3 | 0.810 |
Cluster 9 | Cluster 11 | 0.805 |
Cluster 0 | Cluster 18 | 0.800 |
Cluster 16 | Cluster 3 | 0.797 |
Cluster 8 | Cluster 17 | 0.794 |
Cluster 6 | Cluster 5 | 0.793 |
Cluster 18 | Cluster 5 | 0.793 |
Cluster 5 | Cluster 14 | 0.793 |
Cluster 10 | Cluster 17 | 0.792 |
Cluster 2 | Cluster 17 | 0.790 |
Cluster 11 | Cluster 3 | 0.789 |
Appendix 3: Topic transition with top-level MeSH terms from period 1 to period 4

Appendix 4: Cluster similarity within a period by WPCS for the “same” relation
Period | Similarity | |
---|---|---|
1 | Value | |
Cluster 7 | Cluster 11 | 0.729 |
Cluster 7 | Cluster 12 | 0.706 |
Cluster 9 | Cluster 16 | 0.704 |
Cluster 12 | Cluster 13 | 0.699 |
Cluster 7 | Cluster 9 | 0.689 |
Cluster 1 | Cluster 9 | 0.677 |
2 | Value | |
---|---|---|
Cluster 5 | Cluster 14 | 0.814 |
Cluster 7 | Cluster 13 | 0.806 |
Cluster 0 | Cluster 3 | 0.801 |
Cluster 7 | Cluster 15 | 0.800 |
Cluster 6 | Cluster 8 | 0.783 |
Cluster 7 | Cluster 14 | 0.772 |
Cluster 12 | Cluster 16 | 0.767 |
Cluster 1 | Cluster 13 | 0.766 |
Cluster 4 | Cluster 18 | 0.760 |
3 | Value | |
---|---|---|
Cluster 4 | Cluster 10 | 0.813 |
Cluster 2 | Cluster 10 | 0.791 |
Cluster 2 | Cluster 5 | 0.788 |
Cluster 0 | Cluster 2 | 0.778 |
Cluster 0 | Cluster 4 | 0.772 |
Cluster 5 | Cluster 10 | 0.769 |
4 | Value | |
---|---|---|
Cluster 14 | Cluster 17 | 0.860 |
Cluster 3 | Cluster 14 | 0.817 |
Cluster 3 | Cluster 17 | 0.811 |
Cluster 2 | Cluster 17 | 0.799 |
Cluster 5 | Cluster 14 | 0.796 |
Cluster 3 | Cluster 13 | 0.789 |
Rights and permissions
About this article
Cite this article
Song, M., Heo, G.E. & Kim, S.Y. Analyzing topic evolution in bioinformatics: investigation of dynamics of the field with conference data in DBLP. Scientometrics 101, 397–428 (2014). https://doi.org/10.1007/s11192-014-1246-2
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-014-1246-2