Skip to main content
Log in

Analyzing topic evolution in bioinformatics: investigation of dynamics of the field with conference data in DBLP

  • Published:
Scientometrics Aims and scope Submit manuscript

Abstract

In this paper we analyze topic evolution over time within bioinformatics to uncover the underlying dynamics of that field, focusing on the recent developments in the 2000s. We select 33 bioinformatics related conferences indexed in DBLP from 2000 to 2011. The major reason for choosing DBLP as the data source instead of PubMed is that DBLP retains most bioinformatics related conferences, and to study dynamics of the field, conference papers are more suitable than journal papers. We divide a period of a dozen years into four periods: period 1 (2000–2002), period 2 (2003–2005), period 3 (2006–2008) and period 4 (2009–2011). To conduct topic evolution analysis, we employ three major procedures, and for each procedure, we develop the following novel technique: the Markov Random Field-based topic clustering, automatic cluster labeling, and topic similarity based on Within-Period Cluster Similarity and Between-Period Cluster Similarity. The experimental results show that there are distinct topic transition patterns between different time periods. From period 1 to period 3, new topics seem to have emerged and expanded, whereas from period 3 to period 4, topics are merged and display more rigorous interaction with each other. This trend is confirmed by the collaboration pattern over time.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15

Similar content being viewed by others

References

  • Allan, J., Carbonell, J., Doddington, G., Yamron, J., & Yang, Y. (1998). Topic detection and tracking pilot study: Final report. In Proceedings of the DARPA Broadcast News Transcription and Understanding Workshop (pp. 194–218).

  • AlSumait, L., Barbará, D., & Domeniconi, C. (2008). On-line LDA: Adaptive topic models for mining text streams with applications to topic detection and tracking. In ICDM’08: Proceeding of the 2008 Eighth IEEE International Conference on Data Mining, (pp. 3–12). IEEE Computer Society.

  • Åström, F. (2007). Changes in the LIS research front: Time-sliced cocitation analyses of LIS journal articles, 1990–2004. Journal of the American Society for Information Science and Technology, 58(7), 947–957.

    Article  Google Scholar 

  • Banerjee, A., Basu, S., & Merugu, S. (2007). A generalized maximum entropy approach to bregman co-clustering and matrix approximation. In Proceedings of the SIAM International Conference on Data Mining (SDM-2007) (pp. 1919–1986).

  • Bansard, J. Y., Rebholz-Schuhmann, D., Cameron, G., Clark, D., Van Mulligen, E., Beltrame, F., et al. (2007). Medical informatics and bioinformatics: A bibliometric study. IEEE Transactions on Information Technology in Biomedicine, 11(3), 237–243.

    Article  Google Scholar 

  • Baxevanis, A. D., & Ouellette, B. F. (2004). Bioinformatics: A practical guide to the analysis of genes & proteins. New York: Wiley.

    Google Scholar 

  • Bekkerman, R., Sahami, M., & Learned-Miller, E. (2006). Combinatorial Markov random fields. In ECML’06 Proceedings of the 17th European Conference on Machine Learning (pp. 30–41).

  • Besag, J. (1974). Spatial interaction and statistical analysis of lattice systems. Journal of the Royal Statistical Society, 36(2), 192–236.

    MathSciNet  MATH  Google Scholar 

  • Biryukov, M., & Dong, C. (2010). Analysis of computer science communities based on DBLP. In Research and advanced technology for digital libraries (pp. 228-235). Berlin, Heidelberg: Springer.

  • Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent Dirichlet allocation. Journal of Machine Learning Research, 3, 993–1022

  • Chang, Y. W., & Huang, M. H. (2012). A study of the evolution of interdisciplinarity in library and information science: Using three bibliometric methods. Journal of the American Society for Information Science and Technology, 63(1), 22–33.

    Article  Google Scholar 

  • Chen, C. (2006). CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature. Journal of the American Society for Information Science and Technology, 57(3), 359–377.

    Article  Google Scholar 

  • Chen, K., & Guan, J. (2011). A bibliometric investigation of research performance in emerging nanobiopharmaceuticals. Journal of Informetrics, 5(2), 233–247.

    Article  Google Scholar 

  • Cobo, M. J., López-Herrera, A. G., Herrera-Viedma, E., & Herrera, F. (2011). An approach for detecting, quantifying, and visualizing the evolution of a research field: A practical application to the fuzzy sets theory field. Journal of Informetrics, 5(1), 146–166.

    Article  Google Scholar 

  • Dhillon, I. S., Mallela, S., & Modha, D. S. (2003). Information-theoretic co-clustering. In Proceedings of the Ninth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 89–98). New York: ACM.

  • Diaz, F. (2009). Integration of news content into web results. In Proceedings of the Second ACM International Conference on Web Search and Data Mining (pp. 182–191). New York: ACM.

  • Ding, Y., Chowdhury, G. G., & Foo, S. (2001). Bibliometric cartography of information retrieval research by using co-word analysis. Information Processing and Management, 37(6), 817–842.

    Article  MATH  Google Scholar 

  • Fukumoto, F., & Suzuki, Y. (2000, July). Event tracking based on domain dependency. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 57–64). New York: ACM.

  • Glänzel, W., Janssens, F., & Thijs, B. (2009). A comparative analysis of publication activity and citation impact based on the core literature in bioinformatics. Scientometrics, 79(1), 109–129.

    Article  Google Scholar 

  • Griffiths, T., & Steyvers, M. (2004). Finding scientific topics. Proceedings of the National Academy of Sciences of the United States of America, 101(suppl. 1), 5228–5235.

    Article  Google Scholar 

  • Hedges, L. V., & Olkin, I. (1985). Statistical methods for meta-analysis. San Diego, CA: Academic Press.

    MATH  Google Scholar 

  • Huang, H., Andrews, J., & Tang, J. (2012). Citation characterization and impact normalization in bioinformatics journals. Journal of the American Society of Information Science and Technology, 63(3), 490–497.

    Article  Google Scholar 

  • Huang, Z., Yan, Y., Qiu, Y., & Qiao, S. (2009). Exploring emergent semantic communities from DBLP bibliography database. In ASONAM’09: Proceeding of the 2009 International Conference on Advances in Social Network Analysis and Mining. (pp. 219–224). IEEE Computer Society.

  • Jain, E. (2002). Current trends in bioinformatics. Trends in Biotechnology, 20(8), 317–319.

    Article  Google Scholar 

  • Janssens, F., Glänzel, W., & De Moor, B. (2007). Dynamic hybrid clustering of bioinformatics by incorporating text mining and citation analysis. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 360–369). New York: ACM.

  • Jin, Y., Myaeng, S. H., & Jung, Y. (2007). Use of place information for improved event tracking. Information Processing and Management, 43(2), 365–378.

    Article  Google Scholar 

  • Jo, Y., Lagoze, C., & Giles, C. L. (2007). Detecting research topics via the correlation between graphs and texts. In Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 370–379). New York: ACM.

  • Kullback, S., & Leibler, R. A. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86.

    Article  MathSciNet  MATH  Google Scholar 

  • Kuo, J. J., & Chen, H. H. (2007). Cross-document event clustering using knowledge mining from co-reference chains. Information Processing and Management, 43(2), 327–343.

    Article  Google Scholar 

  • Larivière, V., Sugimoto, C. R., & Cronin, B. (2012). A bibliometric chronicling of library and information science’s first hundred years. Journal of the American Society for Information Science and Technology, 63(5), 997–1016.

    Article  Google Scholar 

  • Larson, M., Soleymani, M., Serdyukov, P., Rudinac, S., Wartena, C., Murdock, V., et al. (2011, April). Automatic tagging and geotagging in video collections and communities. In Proceedings of the 1st ACM International Conference on Multimedia Retrieval (p. 51). New York: ACM.

  • Lesk, A. M. (2008). Introduction to bioinformatics. Oxford: Oxford University Press.

    Google Scholar 

  • Levenshtein, V. I. (1966). Binary codes capable of correcting deletions, insertions and reversals. Soviet Physics Doklady, 6(707–710), 1966.

    Google Scholar 

  • Makkonen, J., Ahonen-Myka, H., & Marko Salmenkivi, M. (2004). Simple semantics in topic detection and tracking. Information Retrieval, 7(3–4), 347–368.

    Article  Google Scholar 

  • Manoharan, A. Kanagavel, B., Muthuchidambaram, A., & Kumaravel, J. P. S. (2011). Bioinformatics research: An informetric view. In Proceedings of the 2011 International Conference on Information Communication and Management (pp. 199–204). Singapore: IACSIT Press.

  • Mei, Q., Shen, X., & Zhai, C. (2007). Automatic labeling of multinomial topic models. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 490–499).

  • Mei, Q., & Zhai, C. (2005). Discovering evolutionary theme patterns from text: An exploration of temporal text mining. In Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery in Data Mining (pp. 198–207). New York: ACM.

  • Mimno, D. M., & McCallum, A. (2008). Topic models conditioned on arbitrary features with Dirichlet-multinomial regression. UAI, 2008, 411–418.

    Google Scholar 

  • Molatudi, M., Molotja, N., & Pouris, A. (2009). A bibliometric study of bioinformatics research in South Africa. Scientometrics, 81(1), 47–59.

    Article  Google Scholar 

  • Molidor, R., Sturn, A., Maurer, M., & Trajanoski, Z. (2003). New trends in bioinformatics: From genome sequence to personalized medicine. Experimental Gerontology, 38(10), 1031–1036.

    Article  Google Scholar 

  • Morinaga, S., & Yamanishi, K. (2004). Tracking dynamics of topic trends using a finite mixture model. In Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 811–816). New York: ACM.

  • Neal, R. M. (1993). Probabilistic inference using Markov chain Monte Carlo methods. Technical report, University of Toronto

  • Patra, S. K., & Mishra, S. (2006). Bibliometric study of bioinformatics literature. Scientometrics, 67(3), 477–489.

    Article  Google Scholar 

  • Perez-Iratxeta, C., Andrade-Navarro, M. A., & Wren, J. D. (2007). Evolving research trends in bioinformatics. Briefings in Bioinformatics, 8(2), 88–95.

    Article  Google Scholar 

  • Rajaraman, K., & Tan, A. H. (2001). Topic detection, tracking, and trend analysis using self-organizing neural networks. Advances in Knowledge Discovery and Data Mining (pp. 102–107).

  • Reitz, F., & Hoffmann, O. (2010). An analysis of the evolving coverage of computer science sub-fields in the DBLP digital library. In Research and Advanced Technology for Digital Libraries (pp. 216–227). Berlin, Heidelberg: Springer.

  • Schult, R., & Spiliopoulou, M. (2006). Discovering emerging topics in unlabelled text collections. Lecture Notes in Computer Science (Vol. 4152, pp. 353–366). Berlin: Springer

  • Slonim, N., Friedman, N., & Tishby, N. (2002). Unsupervised document classification using sequential information maximization. In Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 129–136).

  • Small, H. (2006). Tracking and predicting growth areas in science. Scientometrics, 68(3), 595–610.

    Article  Google Scholar 

  • Song, M., Kim, S. Y., Zhang, G., Ding, Y., & Chambers, T. (2014) Productivity and influence in bioinformatics: A bibliometric analysis. Journal of the American Society for Information Science and Technology, 65(2), 352–371.

    Google Scholar 

  • Spiliopoulou, M., Ntoutsi, I., Theodoridis, Y., & Schult, R. (2006). Monic: Modeling and monitoring cluster transitions. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery And Data Mining (pp. 706–711). New York: ACM.

  • Swan, R., & Allan, J. (2000). Automatic generation of overview timelines. In Proceedings of the 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (pp. 49–56). New York: ACM.

  • Tu, Y. N., & Seng, J. L. (2012). Indices of novelty for emerging topic detection. Information Processing and Management, 48(2), 303–325.

    Article  Google Scholar 

  • Wang, C., Blei, D., & Heckerman, D. (2008). Continuous time dynamic topic models. In the 23rd Conference on Uncertainty in Artificial Intelligence.

  • Wang, X., & McCallum, A. (2006). Topics over time: a non-Markov continuous-time model of topical trends. In Proceedings of the 12th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (pp. 424–433). New York: ACM.

  • White, H. D., & McCain, K. W. (1998). Visualizing a discipline: An author co-citation analysis of information science, 1972–1995. Journal of the American Society for Information Science, 49(4), 327–355.

    Google Scholar 

  • Wu, H., Wang, M., Feng J., & Pei, Y. (2010). Research topic evolution in “bioinformatics”. In 2010 4th International Conference on Bioinformatics and Biomedical Engineering(iCBBE). IEEE computer society.

  • Yang, C., Tang, X., Kim, S. Y., & Song, M. (2012) A trend analysis of domain-specific literatures with content and co-author network similarity. In The 14th International Conference on Asia-Pacific Digital Libraries (ICADL 2012)

  • Zhao, D., & Strotmann, A. (2008). Evolution of research activities and intellectual influences in information science 1996–2005: Introducing author bibliographic-coupling analysis. Journal of the American Society for Information Science and Technology, 59(13), 2070–2086.

    Article  Google Scholar 

Download references

Acknowledgments

This work was supported by National Research Foundation of Korea Grant funded by the Korean Government (NRF-2012-2012S1A3A2033291) and by the National Research Foundation of Korea (NRF) grant funded by the Korea government (No. 2012033242).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Min Song.

Appendices

Appendix 1: The list of conference full names

Conference name

Conference full name

APBC

Asia–Pacific Bioinformatics Conference

AVBPA

Audio- and Video-Based Biometric Person Authentication

BIBE

Bioinformatics and Bioengineering

BIBM

Bioinformatics and Biomedicine

BIDM

Biological Data Management

B-interface

Bio-inspired Human–Machine Interfaces and Healthcare Applications

BIOCOMP

Bioinformatics & Computational Biology

BIODEVICES

Biomedical Electronics and Devices

BIOINFORMATICS

International Conference on Bioinformatics

BIOSIG

Biometrics and Electronic Signatures

BIOSIGNALS

Bio-inspired Systems and Signal Processing

BIOSTEC

Biomedical Engineering Systems and Technologies

BMEI

BioMedical Engineering and Informatics

BSBT

Bio-Science and Bio-Technology

CIBCB

Computational Intelligence in Bioinformatics and Computational Biology

CMSB

Computational Methods in Systems Biology

CSB

Computational Systems Bioinformatics

DNA Computing

DNA Computing

ECCB

European Conference on Computational Biology

EuroGP

European Conference on Genetic Programming

EvoBIO

Evolutionary Computation, Machine Learning and Data Mining in Bioinformatics

FOGA

Foundations of Genetic Algorithms

GCB

German Conference on Bioinformatics

GECCO

Genetic and Evolutionary Computation Conference

ISBI

International Symposium on Biomedical Imaging

ISBRA

International Symposium on Bioinformatics Research and Applications

ISMB

Intelligent Systems in Molecular Biology

PRIB

Pattern Recognition in Bioinformatics

PSB

Pacific Symposium on Biocomputing

RECOMB

Research in Computational Molecular Biology

WABI

Workshop on Algorithms in Bioinformatics

WBIR

Workshop on Biomedical Image Registration

WOB

Brazilian Workshop on Bioinformatics

Appendix 2: Cluster similarity between clusters by BPCS for the “same” relation

Period

Similarity

1

2

Value

Cluster 11

Cluster 8

0.797

Cluster 17

Cluster 0

0.778

Cluster 7

Cluster 10

0.770

Cluster 17

Cluster 8

0.764

Cluster 3

Cluster 6

0.763

Cluster 17

Cluster 14

0.756

Cluster 14

Cluster 14

0.750

Cluster 6

Cluster 8

0.748

Cluster 4

Cluster 7

0.747

Cluster 9

Cluster 5

0.747

Cluster 17

Cluster 6

0.745

Cluster 7

Cluster 8

0.736

Cluster 7

Cluster 14

0.735

2

3

Value

Cluster 8

Cluster 10

0.848

Cluster 0

Cluster 2

0.836

Cluster 3

Cluster 16

0.820

Cluster 4

Cluster 10

0.815

Cluster 14

Cluster 8

0.813

Cluster 12

Cluster 17

0.801

Cluster 1

Cluster 0

0.800

Cluster 12

Cluster 1

0.799

Cluster 12

Cluster 8

0.794

Cluster 3

Cluster 9

0.793

Cluster 7

Cluster 9

0.789

Cluster 15

Cluster 9

0.785

Cluster 1

Cluster 19

0.784

Cluster 14

Cluster 14

0.784

3

4

Value

Cluster 8

Cluster 3

0.846

Cluster 5

Cluster 17

0.833

Cluster 2

Cluster 0

0.830

Cluster 4

Cluster 17

0.829

Cluster 12

Cluster 8

0.824

Cluster 19

Cluster 14

0.823

Cluster 10

Cluster 14

0.823

Cluster 15

Cluster 3

0.822

Cluster 10

Cluster 3

0.810

Cluster 9

Cluster 11

0.805

Cluster 0

Cluster 18

0.800

Cluster 16

Cluster 3

0.797

Cluster 8

Cluster 17

0.794

Cluster 6

Cluster 5

0.793

Cluster 18

Cluster 5

0.793

Cluster 5

Cluster 14

0.793

Cluster 10

Cluster 17

0.792

Cluster 2

Cluster 17

0.790

Cluster 11

Cluster 3

0.789

Appendix 3: Topic transition with top-level MeSH terms from period 1 to period 4

Appendix 4: Cluster similarity within a period by WPCS for the “same” relation

Period

Similarity

1

Value

Cluster 7

Cluster 11

0.729

Cluster 7

Cluster 12

0.706

Cluster 9

Cluster 16

0.704

Cluster 12

Cluster 13

0.699

Cluster 7

Cluster 9

0.689

Cluster 1

Cluster 9

0.677

2

Value

Cluster 5

Cluster 14

0.814

Cluster 7

Cluster 13

0.806

Cluster 0

Cluster 3

0.801

Cluster 7

Cluster 15

0.800

Cluster 6

Cluster 8

0.783

Cluster 7

Cluster 14

0.772

Cluster 12

Cluster 16

0.767

Cluster 1

Cluster 13

0.766

Cluster 4

Cluster 18

0.760

3

Value

Cluster 4

Cluster 10

0.813

Cluster 2

Cluster 10

0.791

Cluster 2

Cluster 5

0.788

Cluster 0

Cluster 2

0.778

Cluster 0

Cluster 4

0.772

Cluster 5

Cluster 10

0.769

4

Value

Cluster 14

Cluster 17

0.860

Cluster 3

Cluster 14

0.817

Cluster 3

Cluster 17

0.811

Cluster 2

Cluster 17

0.799

Cluster 5

Cluster 14

0.796

Cluster 3

Cluster 13

0.789

Rights and permissions

Reprints and permissions

About this article

Cite this article

Song, M., Heo, G.E. & Kim, S.Y. Analyzing topic evolution in bioinformatics: investigation of dynamics of the field with conference data in DBLP. Scientometrics 101, 397–428 (2014). https://doi.org/10.1007/s11192-014-1246-2

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1246-2

Keyword

Navigation