Mining “Hidden Phrase” Definitions from the Web

Nguyen, Hung. V.; Velamuru, P.; Kolippakkam, D.; Davulcu, H.; Liu, H.; Ates, M.

doi:10.1007/3-540-36901-5_17

Hung. V. Nguyen⁶,
P. Velamuru⁶,
D. Kolippakkam⁶,
H. Davulcu⁶,
H. Liu⁶ &
…
M. Ates⁷

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2642))

Included in the following conference series:

Asia-Pacific Web Conference

550 Accesses
2 Citations

Abstract

Keyword searching is the most common form of document search on the Web. Many Web publishers manually annotate the META tags and titles of their pages with frequently queried phrases in order to improve their placement and ranking. A “ hidden phrase” is defined as a phrase that occurs in the META tag of a Web page but not in its body. In this paper we present an algorithm that mines the definitions of hidden phrases from the Web documents. Phrase definitions allow (i) publishers to find relevant phrases with high query frequency, and, (ii) search engines to test if the content of the body of a document matches the phrases. We use co-occurrence clustering and association rule mining algorithms to learn phrase definitions from high-dimensional data sets. We also provide experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 84.99; Price excludes VAT (USA)

Softcover Book: USD 109.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

H. Aholen, O. Heinonen, M. Klemettinen, and A. I. Verkamo.: Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Collections. Proceedings of ADL’98, Santa Barbara, USA (4, 1998)
Google Scholar
R. Agrawal and R. Srikant.: Fast Algorithms for mining association rules. In Proc. 20th Int. Conf. VLDB (1994) 487–499
Google Scholar
Cutting and R. Douglas.: Real life information retrieval: Commercial search engines. Part of a panel discussion at SIGIR 1997: Proc. of the 20th Annual ACM SIGIR Conference on Research and Development on Information Retrieval (1997)
Google Scholar
R. C. Dubes and A. K. Jam.: Algorithms for Clustering Data, Prentice Hall, (1988)
Google Scholar
J. Karlgren.: Non-topical factors in information access. Invited talk at WebNet’ 99, Honolulu, Hawaii, USA, (10,1999)
Google Scholar
L. Kaufman and P. J. Rousseeeuw.: Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley and Sons, (1990).
Google Scholar
B. Len, R. Agrawal, and R. Srikant.: Discovering trends in text databases. In D. Heckerman, H. Mannila, D. Pregibon, and R. Uthrysamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97), Newport Beach, California, USA (8,1997). AAAI Press 227–230
Google Scholar
Y. K. Liu.: Finding Description of Definitions of Words on the WWW. Master thesis, University of Sheffield, England, 2000. Available at: http://dis.shef.ac.uk/mark/cv/publications/dissertations/Liu2000.pdf
Google Scholar
L. Page and S. Brin: The anatomy of a large-scale hyper-textual Web search engine. Proceedings of the Seventh International Web Conference WWW 1998
Google Scholar
M. Steinbach, G. Karypis, and V. Kumar.: A Comparison of Document Clustering Techniques. Technical Report #00-034, Department of Computer Science and Engineering, University of Minnesota, USA.
Google Scholar
I. Witten and E. Frank: Data Mining: Practical Machine Learning tools and techniques with Java Implementations. Morgan Kaufman 2000
Google Scholar
http://www.overture.com
http://www.wordtracker.com

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering, Arizona State University, Tempe, AZ, 85287, USA
Hung. V. Nguyen, P. Velamuru, D. Kolippakkam, H. Davulcu & H. Liu
Cash-Us.com, 21 Helen Way, Berkeley Heights, NJ, 07922, USA
M. Ates

Authors

Hung. V. Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
P. Velamuru
View author publications
You can also search for this author in PubMed Google Scholar
D. Kolippakkam
View author publications
You can also search for this author in PubMed Google Scholar
H. Davulcu
View author publications
You can also search for this author in PubMed Google Scholar
H. Liu
View author publications
You can also search for this author in PubMed Google Scholar
M. Ates
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

School of Information Technology and Electrical Engineering, The University of Queensland, Brisbane, QLD, 4072, Australia
Xiaofang Zhou & Maria E. Orlowska &
Department of Mathematics and Computing, University of Southern Queensland, Toowoomba, QLD, 4350, Australia
Yanchun Zhang

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Nguyen, H.V., Velamuru, P., Kolippakkam, D., Davulcu, H., Liu, H., Ates, M. (2003). Mining “Hidden Phrase” Definitions from the Web. In: Zhou, X., Orlowska, M.E., Zhang, Y. (eds) Web Technologies and Applications. APWeb 2003. Lecture Notes in Computer Science, vol 2642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36901-5_17

Download citation

DOI: https://doi.org/10.1007/3-540-36901-5_17
Published: 15 April 2003
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-02354-8
Online ISBN: 978-3-540-36901-1
eBook Packages: Springer Book Archive

Publish with us

Policies and ethics