Abstract
Total or partial duplication of documents affects the effectiveness of the visualization of search results. In this paper we propose a navigation strategy that sorts a list of documents such that the first documents contain more information content decreasing considerably duplication. The strategy defines a content relation between documents based on their equivalence and omission and estimates the new information content obtained from visiting documents. In this paper, we describe the strategy and experimentally evaluate it. These results indicate the potential use of this strategy for the visualization of thematically related documents that are relevant to a query.
Preview
Unable to display preview. Download preview PDF.
Similar content being viewed by others
References
ACM (2006), http://www.acm.org/class
Baeza-Yates, R.A., Ribeiro-Neto, B.A.: Modern Information Retrieval. ACM Press, Addison-Wesley (1999)
Chowdhury, A., Frieder, O., Grossman, D.A., McCabe, M.C.: Collection statistics for fast duplicate document detection. ACM Trans. Inf. Syst. 20(2), 171–191 (2002)
Geffet, M., Feitelson, D.G.: Hierarchical indexing and document matching in bow. In: JCDL, pp. 259–267. ACM, New York, NY, USA (2001)
Google (2007), http://www.google.com
Halkidi, M., Nguyen, B., Varlamis, I., Vazirgiannis, M.: Thesus: Organizing web document collections based on link semantics. VLDB J. 12(4), 320–332 (2003)
Hammouda, K.M., Kamel, M.S.: Efficient phrase-based document indexing for web document clustering. IEEE Trans. Knowl. Data Eng. 16(10), 1279–1296 (2004)
Hearst, M.A.: Modern Information Retrieval, chapter User Interfaces and Visualization, pp. 257–324. ACM Press, New York, NY, USA (1999)
Pereira Jr., Á.R., Ziviani, N.: Syntactic similarity of web documents. In: LA-WEB, p. 194. IEEE Computer Society Press, Los Alamitos, CA, USA (2003)
Kartoo, S.A.: Kartoo Metaseach Engine (2006), http://www.kartoo.com
Koshman, S., Spink, A., Jansen, B.J.: Web searching on the vivisimo search engine. JASIST 57(14), 1875–1887 (2006)
Kou, H., Gardarin, G.: Similarity model and term association for document categorization. In: DEXA Workshops, pp. 256–260. IEEE Computer Society, Los Alamitos, CA, USA (2002)
Ouyang, B.: Delivering knowledge to nasa scientist and engineers: using phrase matching to determine document similarity. In: Guimarães, M. (ed.) ACM Southeast Regional Conference (1), pp. 384–385. ACM, New York, NY, USA (2005)
Roussinov, D., McQuaid, M.: Information navigation by clustering and summarizing query results. In: HICSS (2000)
Yahoo! (2007), http://www.yahoo.com
Zhang, Y., Ji, X., Chu, C.-H, Zha, H.: Correlating summarization of multi-source news with k-way graph bi-clustering. ACM SIGKDD Explorations Newsletter 6(2), 34–42 (2004)
Zhang, Y., Chu, C.-H., Ji, X., Zha, H.: Correlating summarization of multi-source news with k-way graph bi-clustering. SIGKDD Explorations 6(2), 34–42 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Bilbao, R., Rodríguez, M.A. (2007). Navigating Among Search Results: An Information Content Approach. In: Benatallah, B., Casati, F., Georgakopoulos, D., Bartolini, C., Sadiq, W., Godart, C. (eds) Web Information Systems Engineering – WISE 2007. WISE 2007. Lecture Notes in Computer Science, vol 4831. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-76993-4_58
Download citation
DOI: https://doi.org/10.1007/978-3-540-76993-4_58
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-76992-7
Online ISBN: 978-3-540-76993-4
eBook Packages: Computer ScienceComputer Science (R0)