Abstract
Keyword searching is the most common form of document search on the Web. Many Web publishers manually annotate the META tags and titles of their pages with frequently queried phrases in order to improve their placement and ranking. A “ hidden phrase” is defined as a phrase that occurs in the META tag of a Web page but not in its body. In this paper we present an algorithm that mines the definitions of hidden phrases from the Web documents. Phrase definitions allow (i) publishers to find relevant phrases with high query frequency, and, (ii) search engines to test if the content of the body of a document matches the phrases. We use co-occurrence clustering and association rule mining algorithms to learn phrase definitions from high-dimensional data sets. We also provide experimental results.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
H. Aholen, O. Heinonen, M. Klemettinen, and A. I. Verkamo.: Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Collections. Proceedings of ADL’98, Santa Barbara, USA (4, 1998)
R. Agrawal and R. Srikant.: Fast Algorithms for mining association rules. In Proc. 20th Int. Conf. VLDB (1994) 487–499
Cutting and R. Douglas.: Real life information retrieval: Commercial search engines. Part of a panel discussion at SIGIR 1997: Proc. of the 20th Annual ACM SIGIR Conference on Research and Development on Information Retrieval (1997)
R. C. Dubes and A. K. Jam.: Algorithms for Clustering Data, Prentice Hall, (1988)
J. Karlgren.: Non-topical factors in information access. Invited talk at WebNet’ 99, Honolulu, Hawaii, USA, (10,1999)
L. Kaufman and P. J. Rousseeeuw.: Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley and Sons, (1990).
B. Len, R. Agrawal, and R. Srikant.: Discovering trends in text databases. In D. Heckerman, H. Mannila, D. Pregibon, and R. Uthrysamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97), Newport Beach, California, USA (8,1997). AAAI Press 227–230
Y. K. Liu.: Finding Description of Definitions of Words on the WWW. Master thesis, University of Sheffield, England, 2000. Available at: http://dis.shef.ac.uk/mark/cv/publications/dissertations/Liu2000.pdf
L. Page and S. Brin: The anatomy of a large-scale hyper-textual Web search engine. Proceedings of the Seventh International Web Conference WWW 1998
M. Steinbach, G. Karypis, and V. Kumar.: A Comparison of Document Clustering Techniques. Technical Report #00-034, Department of Computer Science and Engineering, University of Minnesota, USA.
I. Witten and E. Frank: Data Mining: Practical Machine Learning tools and techniques with Java Implementations. Morgan Kaufman 2000
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2003 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Nguyen, H.V., Velamuru, P., Kolippakkam, D., Davulcu, H., Liu, H., Ates, M. (2003). Mining “Hidden Phrase” Definitions from the Web. In: Zhou, X., Orlowska, M.E., Zhang, Y. (eds) Web Technologies and Applications. APWeb 2003. Lecture Notes in Computer Science, vol 2642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36901-5_17
Download citation
DOI: https://doi.org/10.1007/3-540-36901-5_17
Published:
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-02354-8
Online ISBN: 978-3-540-36901-1
eBook Packages: Springer Book Archive