Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Recommended Reading
Broder AZ, Glassman SC, Manasse MS, Zweig G. Syntactic clustering of the web. Comput Netw. 1997;29(8–13):1157–66.
Chowdhury A, Frieder O, Grossman DA, McCabe MC. Collection statistics for fast duplicate document detection. ACM Trans Inf Syst. 2002;20(2): 171–91.
Cutting DR, Pedersen JO, Karger D, Tukey JW. Scatter/gather: a cluster-based approach to browsing large document collections. In: Proceedings of the 15th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1992. p. 318–29.
Dumais ST, Cutrell E, Chen H. Optimizing search by showing results in context. In: Proceedings of the SIGCHI Conference on Human Factors in Computing Systems; 2001. p. 277–84.
Ferragina P, Gulli A. A personalized search engine based on Web-snippet hierarchical clustering. In: Proceedings of the 14th International World Wide Web Conference; 2005. p. 801–10.
Hearst MA, Pedersen JO. 1Reexamining the cluster hypothesis: scatter/gather on retrieval results. In: Proceedings of the 19th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 1996. p. 76–84.
Hoad T, Zobel J. Methods for identifying versioned and plagiarised documents. J Am Soc Inf Sci Technol. 2003;54(3):203–15.
Huffman S, Lehman A, Stolboushkin A, Wong-Toi H, Yang F, Roehrig H. Multiple-signal duplicate detection for search evaluation. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2007. p. 223–30.
Jardine N, van Rijsbergen C. The use of hierarchic clustering in information retrieval. Inf Storage Retrovir. 1971;7(5):217–40.
Manber U. Finding similar files in a large file system. In: Proceedings of the USENIX Winter 1994 Technical Conference; 1994. p. 1–10.
Mei Q, Shen X, Zhai C. Automatic labeling of multinomial topic models. In: Proceedings of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; 2007. p. 490–9.
Shivakumar N, Garcia-Molina H. SCAM: a copy detection mechanism for digital documents. In: Proceedings of the 2nd International Conference in Theory and Practice of Digital Libraries; 1995.
Wang X, Zhai C. Learn from web search logs to organize search results. In: Proceedings of the 33rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval; 2007. p. 87–94.
Willett P. Recent trends in hierarchic document clustering: a critical review. Inf Process Manag. 1988;24(5):577–97.
Zamir O, Etzioni O. Grouper: a dynamic clustering interface to Web search results. In: Proceedings of the 8th International World Wide Web Conference; 1999.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Section Editor information
Rights and permissions
Copyright information
© 2018 Springer Science+Business Media, LLC, part of Springer Nature
About this entry
Cite this entry
Shen, X., Zhai, C. (2018). Web Search Result De-duplication and Clustering. In: Liu, L., Özsu, M.T. (eds) Encyclopedia of Database Systems. Springer, New York, NY. https://doi.org/10.1007/978-1-4614-8265-9_326
Download citation
DOI: https://doi.org/10.1007/978-1-4614-8265-9_326
Published:
Publisher Name: Springer, New York, NY
Print ISBN: 978-1-4614-8266-6
Online ISBN: 978-1-4614-8265-9
eBook Packages: Computer ScienceReference Module Computer Science and Engineering