Skip to main content
Log in

Cluster-based polyrepresentation as science modelling approach for information retrieval

  • Published:
Scientometrics Aims and scope Submit manuscript

An Erratum to this article was published on 03 April 2015

Abstract

The increasing number of publications make searching and accessing the produced literature a challenging task. A recent development in bibliographic databases is to use advanced information retrieval techniques in combination with bibliographic means like citations. In this work we will present an approach that combines a cognitive information retrieval framework based on the principle of polyrepresentation with document clustering to enable the user to explore a collection more interactively than by just examining a ranked result list. Our approach uses information need representations as well as different document representations including citations. To evaluate our ideas we employ a simulated user strategy utilising a cluster ranking approach. We report on the possible effectiveness of our approach and on several strategies how users can achieve a higher search effectiveness through cluster browsing. Our results confirm that our proposed polyrepresentative cluster browsing strategy can in principle significantly improve the search effectiveness. However, further evaluations including a more refined user simulation are needed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3

Similar content being viewed by others

Notes

  1. http://itlab.dbit.dk/isearch/.

  2. http://terrier.org/.

References

  • Azzopardi, L. (2011). The economics in interactive information retrieval. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2011) (pp. 15–24). New York, New York, USA: ACM Press.

    Google Scholar 

  • Chen, C., Ibekwe-SanJuan, F., & Hou, J. (2010). The structure and dynamics of cocitation clusters: A multipleperspective cocitation analysis. Journal of the American Society for Information Science and Technology, 61(7), 1386–1409.

    Article  Google Scholar 

  • Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992). Scatter/gather: A cluster-based approach to browsing large document collections. In N. Belkin, P. Ingwersen, A. M. Pejtersen & E. A. Fox (Eds.), Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR’92 (pp. 318–329).

  • Fox, E. A., & Shaw, J. A. (1993). Combination of multiple searches. In D. Harman (Ed.), The second text retrieval conference (TREC-2). National Institute of Standards and Technology, Gaithersburg, Md. 20899 (pp. 243–252).

  • Frommholz, I., & Abbasi, M. K. (2014). On clustering and polyrepresentation. In M. de Rijke, T. Kenter, A. P. de Vries, C. Zhai, F. de Jong, K. Radinsky & K. Hofmann (Eds.), Proceedings of the European conference on information retrieval (ECIR 2014) (Vol. 1, pp. 618–623). Springer.

  • Frommholz, I., Larsen, B., Piwowarski, B., Lalmas, M., Ingwersen, P., & van Rijsbergen, K. (2010). Supporting polyrepresentation in a quantum-inspired geometrical retrieval framework. In Proceedings of the 2010 information interaction in context symposium (pp. 115–124). New Brunswick: ACM.

    Google Scholar 

  • Fuhr, N., Lechtenfeld, M., Stein, B., & Gollub, T. (2011). The optimum clustering framework: Implementing the cluster hypothesis. Information Retrieval, 15(2), 93–115.

    Article  Google Scholar 

  • Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005a). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548–1572.

    Article  Google Scholar 

  • Glenisson, P., Glänzel, W., & Persson, O. (2005b). Combining full-text analysis and bibliometric indicators. A pilot study. Scientometrics, 63(1), 163–180.

    Google Scholar 

  • Goldszmidt, M., & Sahami, M. (1998). A probabilistic approach to full-text document clustering. Technical report ITAD-433-MS-98-044, SRI International.

  • Hearst, M. A., & Pedersen, J. O. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the SIGIR 1996 (pp. 76–84).

  • Hull, D. (1993) Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 329–338). ACM.

  • Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory. The Journal of Documentation, 52(1), 3–50.

    Article  Google Scholar 

  • Ingwersen, P., & Järvelin, K. (2005). The turn: Integration of information seeking and retrieval in context. Secaucus, NJ, USA: Springer

    Google Scholar 

  • Ji, X., & Xu, W. (2006). Document clustering with prior knowledge. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 405–412). ACM.

  • Ke, W., & Sugimoto, C. R., & Mostafa J. (2009). Dynamicity vs. effectiveness: Studying online clustering for scatter/gather. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (pp. 19–26). ACM.

  • Kelly, D., & Fu, X. (2007). Eliciting better information need descriptions from users of information search systems. Information Processing & Management, 43(1), 30–46.

    Article  Google Scholar 

  • Larsen, B. (2002). Exploiting citation overlaps for information retrieval: Generating a boomerang effect from the network of scientific papers. Scientometrics, 54(2), 155–178.

    Article  Google Scholar 

  • Larsen, B., Ingwersen, P., & Kekäläinen, J. (2006). The polyrepresentation continuum in IR. In IIiX: Proceedings of the 1st international conference on information interaction in context (pp. 88–96). New York, NY, USA: ACM.

    Chapter  Google Scholar 

  • Larsen, B., Lioma, C., Frommholz, I., & Schütze, H. (2012). Preliminary study of technical terminology for the retrieval of scientific book metadata records categories and subject descriptors. In SIGIR 2012: Proceedings of the 35th annual international ACM SIGIR conference on research and development in information retrieval (pp. 1131–1132).

  • Leuski, A. (2001). Evaluating document clustering for interactive information retrieval. In Proceedings of the tenth international conference on information and knowledge management (pp. 33–40). ACM.

  • Lüke, T., Schaer, P., & Mayr, P. (2013). A framework for specific term recommendation systems. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval—SIGIR’13 (p. 1093).

  • Lykke, M., Larsen, B., Lund, H., & Ingwersen, P. (2010). Developing a test collection for the evaluation of integrated search. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, R. Stefan & K. Rijsbergen (Eds.), Proceedings ECIR 2010 (pp. 627–630). Berlin, Heidelberg: Springer.

    Google Scholar 

  • MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, California, USA (Vol. 1).

  • Mayr, P., & Mutschke, P. (2013). Bibliometric-enhanced retrieval models for big scholarly information systems. In Proceedings IEEE international conference on big data workshop on scholarly big data: Challenges and ideas.

  • Mayr, P., Mutschke, P., & Petras, V. (2008). Reducing semantic complexity in distributed digital libraries: Treatment of term vagueness and document re-ranking. Library Review, 57(3), 213–224.

    Article  Google Scholar 

  • Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value-added services for scholarly information systems. Scientometrics, 89(1), 349–364.

    Article  Google Scholar 

  • Na, S.-H., Kang, I.-S., & Lee, J.-H. (2007). Adaptive document clustering based on query-based similarity. Information Processing & Management, 43(4), 887–901.

    Article  Google Scholar 

  • Nottelmann, H., & Fuhr, N. (2003). From retrieval status values to probabilities of relevance for advanced IR applications. Information Retrieval, 6(4), 363–388.

  • Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006).

  • Raiber, F., & Kurland, O. (2013). Ranking document clusters using markov random fields. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval-SIGIR’13 (pp. 333–342).

  • Ritchie, A., Teufel, S., & Robertson, S. (2006) Creating a test collection for citation-based IR experiments. In Proceedings of the human language technology conference of the NAACL, main conference.

  • Robertson, S. E., Walker, S., Beaulieu, M. M., Gatford, M., & Payne, A. (1998). Okapi at TREC-7. In Proceedings of the 7th text retrival converence (TREC-7).

  • Schaer, P., Mayr, P., & Lüke, T. (2012). Extending term suggestion with author names. In Proceedings of theory and practice of digital libraries 2012 (TPDL 2012).

  • Skov, M., Larsen, B., & Ingwersen, P. (2008). Inter and intra-document contexts applied in polyrepresentation for best match IR. Information Processing & Management, 44(5), 1673–1683.

    Article  Google Scholar 

  • Smucker M. D., Allan J., Carterette B. (2007) A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM conference on information and knowledge management (CIKM) (pp. 623–632). ACM.

  • Tombros, A., Villa, R., & Van Rijsbergen, C. (2002). The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing & Management, 38(4), 559–582.

    Article  MATH  Google Scholar 

  • van Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). Newton, MA, USA: Butterworth-Heinemann.

    Google Scholar 

  • Webber, W., Moffat, A., Zobel, J., & Sakai T. (2008). Precision-at-ten considered redundant. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 695–696). ACM.

Download references

Acknowledgments

The authors would like to thank anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Muhammad Kamran Abbasi.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Abbasi, M.K., Frommholz, I. Cluster-based polyrepresentation as science modelling approach for information retrieval. Scientometrics 102, 2301–2322 (2015). https://doi.org/10.1007/s11192-014-1478-1

Download citation

  • Received:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11192-014-1478-1

Keywords