Skip to main content

Advertising Keywords Extraction from Web Pages

  • Conference paper
Web Information Systems and Mining (WISM 2010)

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6318))

Included in the following conference series:

Abstract

A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and it has been become a rapidly growing business in recent years. We describe a system that learns how to extract keywords from web pages for advertisement targeting. Firstly a text network for a single webpage is build, then PageRank is applied in the network to decide on the importance of a word, finally top-ranked words are selected as keywords of the webpage. The algorithm is tested on the corpus of blog pages, and the experiment result proves practical and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Jin, X., Li, Y., Mah, T., Tong, J.: Sensitive Webpage Classification for Content Advertising. In: Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, San Jose, California, August 12-12, pp. 28–33 (2007)

    Google Scholar 

  2. Yih, W.-T., Goodman, J., Carvalho, V.R.: Finding Advertising Keywords on Web Pages. In: Proceedings of WWW 2006 (2006)

    Google Scholar 

  3. Yang, W., Li, X.: Chinese keyword extraction based on max-duplicated strings of the documents. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 439–440 (2002)

    Google Scholar 

  4. Panunzi, A., Fabbri, M., Moneglia, M.: Keyword Extraction in Open-Domain Multilingual Textual Resources. In: First International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution, pp. 253–256 (2005)

    Google Scholar 

  5. Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2(4), 303–336 (2000)

    Article  Google Scholar 

  6. Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proc. of IJCAI 1999, pp. 668–673 (1999)

    Google Scholar 

  7. Mitchell, T.: Tutorial on machine learning over natural language documents (1997), http://www.cs.cmu.edu/~tom/text-learning.ps

  8. Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proc. of EMNLP 2003, pp. 216–223 (2003)

    Google Scholar 

  9. Ferreri Cancho, R., Solé, R.V.: The small-world of human language. In: Proceedings of the Royal Society of London, pp. 2261–2266 (2001)

    Google Scholar 

  10. Dorogovtsev, S.N., Mendes, J.F.F.: Language as an evolving word web. In: Proc. Royal Soc. London, pp. 2603–2606 (2001)

    Google Scholar 

  11. Sole, R.V., Corominas, B., Valverde, S., Steels, L.: Language Networks: their structure, function and evolution, Trends in Cognitive Sciences (2005)

    Google Scholar 

  12. Luoxia, W., Yong, L., Wei, L., et al.: 3-degree Separation and Small World Effect of Chinese Character Network. Chinese Science Bulletin 49(24), 2615–2616 (2004)

    Google Scholar 

  13. Wang, J., Liu, J., Wang, C.: Keyword Extraction Based on PageRank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 738–746. Springer, Heidelberg (2007)

    Google Scholar 

  14. Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of EMNLP, pp. 404–411 (2004)

    Google Scholar 

  15. Matsuo, Y., Ishizuka, M.: Keyword Extraction from a Single Document using Word Cooccurrence Statistical Information. In: Proceedings of the 16th International FLAIRS Conference, St. Augustine, Floridam (2003)

    Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2010 Springer-Verlag Berlin Heidelberg

About this paper

Cite this paper

Liu, J., Wang, C., Liu, Z., Yao, W. (2010). Advertising Keywords Extraction from Web Pages. In: Wang, F.L., Gong, Z., Luo, X., Lei, J. (eds) Web Information Systems and Mining. WISM 2010. Lecture Notes in Computer Science, vol 6318. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16515-3_42

Download citation

  • DOI: https://doi.org/10.1007/978-3-642-16515-3_42

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-642-16514-6

  • Online ISBN: 978-3-642-16515-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics