Abstract
The increasing number of publications make searching and accessing the produced literature a challenging task. A recent development in bibliographic databases is to use advanced information retrieval techniques in combination with bibliographic means like citations. In this work we will present an approach that combines a cognitive information retrieval framework based on the principle of polyrepresentation with document clustering to enable the user to explore a collection more interactively than by just examining a ranked result list. Our approach uses information need representations as well as different document representations including citations. To evaluate our ideas we employ a simulated user strategy utilising a cluster ranking approach. We report on the possible effectiveness of our approach and on several strategies how users can achieve a higher search effectiveness through cluster browsing. Our results confirm that our proposed polyrepresentative cluster browsing strategy can in principle significantly improve the search effectiveness. However, further evaluations including a more refined user simulation are needed.



Similar content being viewed by others
References
Azzopardi, L. (2011). The economics in interactive information retrieval. In Proceedings of the 34th international ACM SIGIR conference on research and development in information retrieval (SIGIR 2011) (pp. 15–24). New York, New York, USA: ACM Press.
Chen, C., Ibekwe-SanJuan, F., & Hou, J. (2010). The structure and dynamics of cocitation clusters: A multipleperspective cocitation analysis. Journal of the American Society for Information Science and Technology, 61(7), 1386–1409.
Cutting, D. R., Karger, D. R., Pedersen, J. O., & Tukey, J. W. (1992). Scatter/gather: A cluster-based approach to browsing large document collections. In N. Belkin, P. Ingwersen, A. M. Pejtersen & E. A. Fox (Eds.), Proceedings of the 15th annual international ACM SIGIR conference on research and development in information retrieval, ACM, SIGIR’92 (pp. 318–329).
Fox, E. A., & Shaw, J. A. (1993). Combination of multiple searches. In D. Harman (Ed.), The second text retrieval conference (TREC-2). National Institute of Standards and Technology, Gaithersburg, Md. 20899 (pp. 243–252).
Frommholz, I., & Abbasi, M. K. (2014). On clustering and polyrepresentation. In M. de Rijke, T. Kenter, A. P. de Vries, C. Zhai, F. de Jong, K. Radinsky & K. Hofmann (Eds.), Proceedings of the European conference on information retrieval (ECIR 2014) (Vol. 1, pp. 618–623). Springer.
Frommholz, I., Larsen, B., Piwowarski, B., Lalmas, M., Ingwersen, P., & van Rijsbergen, K. (2010). Supporting polyrepresentation in a quantum-inspired geometrical retrieval framework. In Proceedings of the 2010 information interaction in context symposium (pp. 115–124). New Brunswick: ACM.
Fuhr, N., Lechtenfeld, M., Stein, B., & Gollub, T. (2011). The optimum clustering framework: Implementing the cluster hypothesis. Information Retrieval, 15(2), 93–115.
Glenisson, P., Glänzel, W., Janssens, F., & De Moor, B. (2005a). Combining full text and bibliometric information in mapping scientific disciplines. Information Processing & Management, 41(6), 1548–1572.
Glenisson, P., Glänzel, W., & Persson, O. (2005b). Combining full-text analysis and bibliometric indicators. A pilot study. Scientometrics, 63(1), 163–180.
Goldszmidt, M., & Sahami, M. (1998). A probabilistic approach to full-text document clustering. Technical report ITAD-433-MS-98-044, SRI International.
Hearst, M. A., & Pedersen, J. O. (1996). Reexamining the cluster hypothesis: Scatter/gather on retrieval results. In Proceedings of the SIGIR 1996 (pp. 76–84).
Hull, D. (1993) Using statistical testing in the evaluation of retrieval experiments. In Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval (pp. 329–338). ACM.
Ingwersen, P. (1996). Cognitive perspectives of information retrieval interaction: Elements of a cognitive IR theory. The Journal of Documentation, 52(1), 3–50.
Ingwersen, P., & Järvelin, K. (2005). The turn: Integration of information seeking and retrieval in context. Secaucus, NJ, USA: Springer
Ji, X., & Xu, W. (2006). Document clustering with prior knowledge. In Proceedings of the 29th annual international ACM SIGIR conference on research and development in information retrieval (pp. 405–412). ACM.
Ke, W., & Sugimoto, C. R., & Mostafa J. (2009). Dynamicity vs. effectiveness: Studying online clustering for scatter/gather. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (pp. 19–26). ACM.
Kelly, D., & Fu, X. (2007). Eliciting better information need descriptions from users of information search systems. Information Processing & Management, 43(1), 30–46.
Larsen, B. (2002). Exploiting citation overlaps for information retrieval: Generating a boomerang effect from the network of scientific papers. Scientometrics, 54(2), 155–178.
Larsen, B., Ingwersen, P., & Kekäläinen, J. (2006). The polyrepresentation continuum in IR. In IIiX: Proceedings of the 1st international conference on information interaction in context (pp. 88–96). New York, NY, USA: ACM.
Larsen, B., Lioma, C., Frommholz, I., & Schütze, H. (2012). Preliminary study of technical terminology for the retrieval of scientific book metadata records categories and subject descriptors. In SIGIR 2012: Proceedings of the 35th annual international ACM SIGIR conference on research and development in information retrieval (pp. 1131–1132).
Leuski, A. (2001). Evaluating document clustering for interactive information retrieval. In Proceedings of the tenth international conference on information and knowledge management (pp. 33–40). ACM.
Lüke, T., Schaer, P., & Mayr, P. (2013). A framework for specific term recommendation systems. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval—SIGIR’13 (p. 1093).
Lykke, M., Larsen, B., Lund, H., & Ingwersen, P. (2010). Developing a test collection for the evaluation of integrated search. In C. Gurrin, Y. He, G. Kazai, U. Kruschwitz, S. Little, T. Roelleke, R. Stefan & K. Rijsbergen (Eds.), Proceedings ECIR 2010 (pp. 627–630). Berlin, Heidelberg: Springer.
MacQueen, J., et al. (1967). Some methods for classification and analysis of multivariate observations. In Proceedings of the fifth Berkeley symposium on mathematical statistics and probability, California, USA (Vol. 1).
Mayr, P., & Mutschke, P. (2013). Bibliometric-enhanced retrieval models for big scholarly information systems. In Proceedings IEEE international conference on big data workshop on scholarly big data: Challenges and ideas.
Mayr, P., Mutschke, P., & Petras, V. (2008). Reducing semantic complexity in distributed digital libraries: Treatment of term vagueness and document re-ranking. Library Review, 57(3), 213–224.
Mutschke, P., Mayr, P., Schaer, P., & Sure, Y. (2011). Science models as value-added services for scholarly information systems. Scientometrics, 89(1), 349–364.
Na, S.-H., Kang, I.-S., & Lee, J.-H. (2007). Adaptive document clustering based on query-based similarity. Information Processing & Management, 43(4), 887–901.
Nottelmann, H., & Fuhr, N. (2003). From retrieval status values to probabilities of relevance for advanced IR applications. Information Retrieval, 6(4), 363–388.
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., & Lioma C. (2006). Terrier: A high performance and scalable information retrieval platform. In Proceedings of ACM SIGIR’06 workshop on open source information retrieval (OSIR 2006).
Raiber, F., & Kurland, O. (2013). Ranking document clusters using markov random fields. In Proceedings of the 36th international ACM SIGIR conference on research and development in information retrieval-SIGIR’13 (pp. 333–342).
Ritchie, A., Teufel, S., & Robertson, S. (2006) Creating a test collection for citation-based IR experiments. In Proceedings of the human language technology conference of the NAACL, main conference.
Robertson, S. E., Walker, S., Beaulieu, M. M., Gatford, M., & Payne, A. (1998). Okapi at TREC-7. In Proceedings of the 7th text retrival converence (TREC-7).
Schaer, P., Mayr, P., & Lüke, T. (2012). Extending term suggestion with author names. In Proceedings of theory and practice of digital libraries 2012 (TPDL 2012).
Skov, M., Larsen, B., & Ingwersen, P. (2008). Inter and intra-document contexts applied in polyrepresentation for best match IR. Information Processing & Management, 44(5), 1673–1683.
Smucker M. D., Allan J., Carterette B. (2007) A comparison of statistical significance tests for information retrieval evaluation. In Proceedings of the 16th ACM conference on information and knowledge management (CIKM) (pp. 623–632). ACM.
Tombros, A., Villa, R., & Van Rijsbergen, C. (2002). The effectiveness of query-specific hierarchic clustering in information retrieval. Information Processing & Management, 38(4), 559–582.
van Rijsbergen, C. J. (1979). Information retrieval (2nd ed.). Newton, MA, USA: Butterworth-Heinemann.
Webber, W., Moffat, A., Zobel, J., & Sakai T. (2008). Precision-at-ten considered redundant. In Proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval (pp. 695–696). ACM.
Acknowledgments
The authors would like to thank anonymous reviewers for their insightful comments.
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Abbasi, M.K., Frommholz, I. Cluster-based polyrepresentation as science modelling approach for information retrieval. Scientometrics 102, 2301–2322 (2015). https://doi.org/10.1007/s11192-014-1478-1
Received:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11192-014-1478-1