Abstract
A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and it has been become a rapidly growing business in recent years. We describe a system that learns how to extract keywords from web pages for advertisement targeting. Firstly a text network for a single webpage is build, then PageRank is applied in the network to decide on the importance of a word, finally top-ranked words are selected as keywords of the webpage. The algorithm is tested on the corpus of blog pages, and the experiment result proves practical and effective.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Jin, X., Li, Y., Mah, T., Tong, J.: Sensitive Webpage Classification for Content Advertising. In: Proceedings of the 1st international workshop on Data mining and audience intelligence for advertising, San Jose, California, August 12-12, pp. 28–33 (2007)
Yih, W.-T., Goodman, J., Carvalho, V.R.: Finding Advertising Keywords on Web Pages. In: Proceedings of WWW 2006 (2006)
Yang, W., Li, X.: Chinese keyword extraction based on max-duplicated strings of the documents. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 439–440 (2002)
Panunzi, A., Fabbri, M., Moneglia, M.: Keyword Extraction in Open-Domain Multilingual Textual Resources. In: First International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution, pp. 253–256 (2005)
Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2(4), 303–336 (2000)
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proc. of IJCAI 1999, pp. 668–673 (1999)
Mitchell, T.: Tutorial on machine learning over natural language documents (1997), http://www.cs.cmu.edu/~tom/text-learning.ps
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proc. of EMNLP 2003, pp. 216–223 (2003)
Ferreri Cancho, R., Solé, R.V.: The small-world of human language. In: Proceedings of the Royal Society of London, pp. 2261–2266 (2001)
Dorogovtsev, S.N., Mendes, J.F.F.: Language as an evolving word web. In: Proc. Royal Soc. London, pp. 2603–2606 (2001)
Sole, R.V., Corominas, B., Valverde, S., Steels, L.: Language Networks: their structure, function and evolution, Trends in Cognitive Sciences (2005)
Luoxia, W., Yong, L., Wei, L., et al.: 3-degree Separation and Small World Effect of Chinese Character Network. Chinese Science Bulletin 49(24), 2615–2616 (2004)
Wang, J., Liu, J., Wang, C.: Keyword Extraction Based on PageRank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 738–746. Springer, Heidelberg (2007)
Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of EMNLP, pp. 404–411 (2004)
Matsuo, Y., Ishizuka, M.: Keyword Extraction from a Single Document using Word Cooccurrence Statistical Information. In: Proceedings of the 16th International FLAIRS Conference, St. Augustine, Floridam (2003)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2010 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Liu, J., Wang, C., Liu, Z., Yao, W. (2010). Advertising Keywords Extraction from Web Pages. In: Wang, F.L., Gong, Z., Luo, X., Lei, J. (eds) Web Information Systems and Mining. WISM 2010. Lecture Notes in Computer Science, vol 6318. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16515-3_42
Download citation
DOI: https://doi.org/10.1007/978-3-642-16515-3_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16514-6
Online ISBN: 978-3-642-16515-3
eBook Packages: Computer ScienceComputer Science (R0)