Cluster-based polyrepresentation as science modelling approach for information retrieval

Abbasi, Muhammad Kamran; Frommholz, Ingo

doi:10.1007/s11192-014-1478-1

Cluster-based polyrepresentation as science modelling approach for information retrieval

Published: 25 November 2014

Volume 102, pages 2301–2322, (2015)
Cite this article

Scientometrics Aims and scope Submit manuscript

Muhammad Kamran Abbasi¹ &
Ingo Frommholz¹

689 Accesses
8 Citations
10 Altmetric
1 Mention
Explore all metrics

An Erratum to this article was published on 03 April 2015

Abstract

The increasing number of publications make searching and accessing the produced literature a challenging task. A recent development in bibliographic databases is to use advanced information retrieval techniques in combination with bibliographic means like citations. In this work we will present an approach that combines a cognitive information retrieval framework based on the principle of polyrepresentation with document clustering to enable the user to explore a collection more interactively than by just examining a ranked result list. Our approach uses information need representations as well as different document representations including citations. To evaluate our ideas we employ a simulated user strategy utilising a cluster ranking approach. We report on the possible effectiveness of our approach and on several strategies how users can achieve a higher search effectiveness through cluster browsing. Our results confirm that our proposed polyrepresentative cluster browsing strategy can in principle significantly improve the search effectiveness. However, further evaluations including a more refined user simulation are needed.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Academic information retrieval using citation clusters: in-depth evaluation based on systematic reviews

Article Open access 21 March 2023

Contextualization of topics: browsing through the universe of bibliographic information

Article 01 March 2017

Related records retrieval and pennant retrieval: an exploratory case study

Article 04 December 2019

Notes

References

Azzopardi, L. (2011). The economics in interactive information retrieval. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2011) (pp. 15–24). New York, New York, USA: ACM Press.
Google Scholar
Chen, C., Ibekwe-SanJuan, F., & Hou, J. (2010). The structure and dynamics of cocitation clusters: A multipleperspective cocitation analysis. Journal of the American Society for Information Science and Technology, 61(7), 1386–1409.
Article Google Scholar
Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992). Scatter/gather: A cluster-based approach to browsing large document collections. In N. Belkin, P. Ingwersen, A. M. Pejtersen & E. A. Fox (Eds.), Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR’92 (pp. 318–329).
Fox, E. A., & Shaw, J. A. (1993). Combination of multiple searches. In D. Harman (Ed.), The second text retrieval conference (TREC-2). National Institute of Standards and Technology, Gaithersburg, Md. 20899 (pp. 243–252).
Frommholz, I., & Abbasi, M. K. (2014). On clustering and polyrepresentation. In M. de Rijke, T. Kenter, A. P. de Vries, C. Zhai, F. de Jong, K. Radinsky & K. Hofmann (Eds.), Proceedings of the European conference on information retrieval (ECIR 2014) (Vol. 1, pp. 618–623). Springer.
Frommholz, I., Larsen, B., Piwowarski, B., Lalmas, M., Ingwersen, P., & van Rijsbergen, K. (2010). Supporting polyrepresentation in a quantum-inspired geometrical retrieval framework. In Proceedings of the 2010 information interaction in context symposium (pp. 115–124). New Brunswick: ACM.
Google Scholar
Fuhr, N., Lechtenfeld, M., Stein, B., & Gollub, T. (2011). The optimum clustering framework: Implementing the cluster hypothesis. Information Retrieval, 15(2), 93–115.
Article Google Scholar
Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005a). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548–1572.
Article Google Scholar
Glenisson, P., Glänzel, W., & Persson, O. (2005b). Combining full-text analysis and bibliometric indicators. A pilot study. Scientometrics, 63(1), 163–180.
Google Scholar
Goldszmidt, M., & Sahami, M. (1998). A probabilistic approach to full-text document clustering. Technical report ITAD-433-MS-98-044, SRI International.
Hearst, M. A., & Pedersen, J. O. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the SIGIR 1996 (pp. 76–84).
Hull, D. (1993) Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 329–338). ACM.
Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory. The Journal of Documentation, 52(1), 3–50.
Article Google Scholar
Ingwersen, P., & Järvelin, K. (2005). The turn: Integration of information seeking and retrieval in context. Secaucus, NJ, USA: Springer
Google Scholar
Ji, X., & Xu, W. (2006). Document clustering with prior knowledge. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 405–412). ACM.
Ke, W., & Sugimoto, C. R., & Mostafa J. (2009). Dynamicity vs. effectiveness: Studying online clustering for scatter/gather. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (pp. 19–26). ACM.
Kelly, D., & Fu, X. (2007). Eliciting better information need descriptions from users of information search systems. Information Processing & Management, 43(1), 30–46.
Article Google Scholar
Larsen, B. (2002). Exploiting citation overlaps for information retrieval: Generating a boomerang effect from the network of scientific papers. Scientometrics, 54(2), 155–178.
Article Google Scholar
Larsen, B., Ingwersen, P., & Kekäläinen, J. (2006). The polyrepresentation continuum in IR. In IIiX: Proceedings of the 1st international conference on information interaction in context (pp. 88–96). New York, NY, USA: ACM.
Chapter Google Scholar
Larsen, B., Lioma, C., Frommholz, I., & Schütze, H. (2012). Preliminary study of technical terminology for the retrieval of scientific book metadata records categories and subject descriptors. In SIGIR 2012: Proceedings of the 35th annual international ACM SIGIR conference on research and development in information retrieval (pp. 1131–1132).
Leuski, A. (2001). Evaluating document clustering for interactive information retrieval. In Proceedings of the tenth international conference on information and knowledge management (pp. 33–40). ACM.
Lüke, T., Schaer, P., & Mayr, P. (2013). A framework for specific term recommendation systems. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval—SIGIR’13 (p. 1093).
Lykke, M., Larsen, B., Lund, H., & Ingwersen, P. (2010). Developing a test collection for the evaluation of integrated search. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, R. Stefan & K. Rijsbergen (Eds.), Proceedings ECIR 2010 (pp. 627–630). Berlin, Heidelberg: Springer.
Google Scholar
MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, California, USA (Vol. 1).
Mayr, P., & Mutschke, P. (2013). Bibliometric-enhanced retrieval models for big scholarly information systems. In Proceedings IEEE international conference on big data workshop on scholarly big data: Challenges and ideas.
Mayr, P., Mutschke, P., & Petras, V. (2008). Reducing semantic complexity in distributed digital libraries: Treatment of term vagueness and document re-ranking. Library Review, 57(3), 213–224.
Article Google Scholar
Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value-added services for scholarly information systems. Scientometrics, 89(1), 349–364.
Article Google Scholar
Na, S.-H., Kang, I.-S., & Lee, J.-H. (2007). Adaptive document clustering based on query-based similarity. Information Processing & Management, 43(4), 887–901.
Article Google Scholar
Nottelmann, H., & Fuhr, N. (2003). From retrieval status values to probabilities of relevance for advanced IR applications. Information Retrieval, 6(4), 363–388.
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006).
Raiber, F., & Kurland, O. (2013). Ranking document clusters using markov random fields. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval-SIGIR’13 (pp. 333–342).
Ritchie, A., Teufel, S., & Robertson, S. (2006) Creating a test collection for citation-based IR experiments. In Proceedings of the human language technology conference of the NAACL, main conference.
Robertson, S. E., Walker, S., Beaulieu, M. M., Gatford, M., & Payne, A. (1998). Okapi at TREC-7. In Proceedings of the 7th text retrival converence (TREC-7).
Schaer, P., Mayr, P., & Lüke, T. (2012). Extending term suggestion with author names. In Proceedings of theory and practice of digital libraries 2012 (TPDL 2012).
Skov, M., Larsen, B., & Ingwersen, P. (2008). Inter and intra-document contexts applied in polyrepresentation for best match IR. Information Processing & Management, 44(5), 1673–1683.
Article Google Scholar
Smucker M. D., Allan J., Carterette B. (2007) A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM conference on information and knowledge management (CIKM) (pp. 623–632). ACM.
Tombros, A., Villa, R., & Van Rijsbergen, C. (2002). The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing & Management, 38(4), 559–582.
Article MATH Google Scholar
van Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). Newton, MA, USA: Butterworth-Heinemann.
Google Scholar
Webber, W., Moffat, A., Zobel, J., & Sakai T. (2008). Precision-at-ten considered redundant. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 695–696). ACM.

Download references

Acknowledgments

The authors would like to thank anonymous reviewers for their insightful comments.

Author information

Authors and Affiliations

Institute for Research in Applicable Computing, University of Bedfordshire, Park Square, Luton, LU1 3JU, UK
Muhammad Kamran Abbasi & Ingo Frommholz

Authors

Muhammad Kamran Abbasi
View author publications
You can also search for this author inPubMed Google Scholar
Ingo Frommholz
View author publications
You can also search for this author inPubMed Google Scholar

Corresponding author

Correspondence to Muhammad Kamran Abbasi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Abbasi, M.K., Frommholz, I. Cluster-based polyrepresentation as science modelling approach for information retrieval. Scientometrics 102, 2301–2322 (2015). https://doi.org/10.1007/s11192-014-1478-1

Download citation

Received: 18 October 2014
Published: 25 November 2014
Issue Date: March 2015
DOI: https://doi.org/10.1007/s11192-014-1478-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Cluster-based polyrepresentation as science modelling approach for information retrieval

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Academic information retrieval using citation clusters: in-depth evaluation based on systematic reviews

Contextualization of topics: browsing through the universe of bibliographic information

Related records retrieval and pennant retrieval: an exploratory case study

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now