Skip to main content

Mining “Hidden Phrase” Definitions from the Web

  • Conference paper
  • First Online:
Web Technologies and Applications (APWeb 2003)

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 2642))

Included in the following conference series:

Abstract

Keyword searching is the most common form of document search on the Web. Many Web publishers manually annotate the META tags and titles of their pages with frequently queried phrases in order to improve their placement and ranking. A “ hidden phrase” is defined as a phrase that occurs in the META tag of a Web page but not in its body. In this paper we present an algorithm that mines the definitions of hidden phrases from the Web documents. Phrase definitions allow (i) publishers to find relevant phrases with high query frequency, and, (ii) search engines to test if the content of the body of a document matches the phrases. We use co-occurrence clustering and association rule mining algorithms to learn phrase definitions from high-dimensional data sets. We also provide experimental results.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 84.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 109.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. H. Aholen, O. Heinonen, M. Klemettinen, and A. I. Verkamo.: Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Collections. Proceedings of ADL’98, Santa Barbara, USA (4, 1998)

    Google Scholar 

  2. R. Agrawal and R. Srikant.: Fast Algorithms for mining association rules. In Proc. 20th Int. Conf. VLDB (1994) 487–499

    Google Scholar 

  3. Cutting and R. Douglas.: Real life information retrieval: Commercial search engines. Part of a panel discussion at SIGIR 1997: Proc. of the 20th Annual ACM SIGIR Conference on Research and Development on Information Retrieval (1997)

    Google Scholar 

  4. R. C. Dubes and A. K. Jam.: Algorithms for Clustering Data, Prentice Hall, (1988)

    Google Scholar 

  5. J. Karlgren.: Non-topical factors in information access. Invited talk at WebNet’ 99, Honolulu, Hawaii, USA, (10,1999)

    Google Scholar 

  6. L. Kaufman and P. J. Rousseeeuw.: Finding Groups in Data: an Introduction to Cluster Analysis, John Wiley and Sons, (1990).

    Google Scholar 

  7. B. Len, R. Agrawal, and R. Srikant.: Discovering trends in text databases. In D. Heckerman, H. Mannila, D. Pregibon, and R. Uthrysamy, editors, Proceedings of the Third International Conference on Knowledge Discovery and Data Mining (KDD’97), Newport Beach, California, USA (8,1997). AAAI Press 227–230

    Google Scholar 

  8. Y. K. Liu.: Finding Description of Definitions of Words on the WWW. Master thesis, University of Sheffield, England, 2000. Available at: http://dis.shef.ac.uk/mark/cv/publications/dissertations/Liu2000.pdf

    Google Scholar 

  9. L. Page and S. Brin: The anatomy of a large-scale hyper-textual Web search engine. Proceedings of the Seventh International Web Conference WWW 1998

    Google Scholar 

  10. M. Steinbach, G. Karypis, and V. Kumar.: A Comparison of Document Clustering Techniques. Technical Report #00-034, Department of Computer Science and Engineering, University of Minnesota, USA.

    Google Scholar 

  11. I. Witten and E. Frank: Data Mining: Practical Machine Learning tools and techniques with Java Implementations. Morgan Kaufman 2000

    Google Scholar 

  12. http://www.overture.com

  13. http://www.wordtracker.com

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2003 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Nguyen, H.V., Velamuru, P., Kolippakkam, D., Davulcu, H., Liu, H., Ates, M. (2003). Mining “Hidden Phrase” Definitions from the Web. In: Zhou, X., Orlowska, M.E., Zhang, Y. (eds) Web Technologies and Applications. APWeb 2003. Lecture Notes in Computer Science, vol 2642. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-36901-5_17

Download citation

  • DOI: https://doi.org/10.1007/3-540-36901-5_17

  • Published:

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-02354-8

  • Online ISBN: 978-3-540-36901-1

  • eBook Packages: Springer Book Archive

Publish with us

Policies and ethics