Advertising Keywords Extraction from Web Pages

Liu, Jianyi; Wang, Cong; Liu, Zhengyang; Yao, Wenbin

doi:10.1007/978-3-642-16515-3_42

Jianyi Liu²⁰,
Cong Wang²⁰,
Zhengyang Liu²⁰ &
…
Wenbin Yao²⁰

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 6318))

Included in the following conference series:

International Conference on Web Information Systems and Mining

3022 Accesses
4 Citations

Abstract

A large and growing number of web pages display contextual advertising based on keywords automatically extracted from the text of the page, and it has been become a rapidly growing business in recent years. We describe a system that learns how to extract keywords from web pages for advertisement targeting. Firstly a text network for a single webpage is build, then PageRank is applied in the network to decide on the importance of a word, finally top-ranked words are selected as keywords of the webpage. The algorithm is tested on the corpus of blog pages, and the experiment result proves practical and effective.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Jin, X., Li, Y., Mah, T., Tong, J.: Sensitive Webpage Classification for Content Advertising. In: Proceedings of the 1^st international workshop on Data mining and audience intelligence for advertising, San Jose, California, August 12-12, pp. 28–33 (2007)
Google Scholar
Yih, W.-T., Goodman, J., Carvalho, V.R.: Finding Advertising Keywords on Web Pages. In: Proceedings of WWW 2006 (2006)
Google Scholar
Yang, W., Li, X.: Chinese keyword extraction based on max-duplicated strings of the documents. In: Proceedings of the 25th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 439–440 (2002)
Google Scholar
Panunzi, A., Fabbri, M., Moneglia, M.: Keyword Extraction in Open-Domain Multilingual Textual Resources. In: First International Conference on Automated Production of Cross Media Content for Multi-Channel Distribution, pp. 253–256 (2005)
Google Scholar
Turney, P.D.: Learning algorithms for keyphrase extraction. Information Retrieval 2(4), 303–336 (2000)
Article Google Scholar
Frank, E., Paynter, G.W., Witten, I.H., Gutwin, C., Nevill-Manning, C.G.: Domain-specific keyphrase extraction. In: Proc. of IJCAI 1999, pp. 668–673 (1999)
Google Scholar
Mitchell, T.: Tutorial on machine learning over natural language documents (1997), http://www.cs.cmu.edu/~tom/text-learning.ps
Hulth, A.: Improved automatic keyword extraction given more linguistic knowledge. In: Proc. of EMNLP 2003, pp. 216–223 (2003)
Google Scholar
Ferreri Cancho, R., Solé, R.V.: The small-world of human language. In: Proceedings of the Royal Society of London, pp. 2261–2266 (2001)
Google Scholar
Dorogovtsev, S.N., Mendes, J.F.F.: Language as an evolving word web. In: Proc. Royal Soc. London, pp. 2603–2606 (2001)
Google Scholar
Sole, R.V., Corominas, B., Valverde, S., Steels, L.: Language Networks: their structure, function and evolution, Trends in Cognitive Sciences (2005)
Google Scholar
Luoxia, W., Yong, L., Wei, L., et al.: 3-degree Separation and Small World Effect of Chinese Character Network. Chinese Science Bulletin 49(24), 2615–2616 (2004)
Google Scholar
Wang, J., Liu, J., Wang, C.: Keyword Extraction Based on PageRank. In: Zhou, Z.-H., Li, H., Yang, Q. (eds.) PAKDD 2007. LNCS (LNAI), vol. 4426, pp. 738–746. Springer, Heidelberg (2007)
Google Scholar
Mihalcea, R., Tarau, P.: TextRank: Bringing order into texts. In: Proceedings of EMNLP, pp. 404–411 (2004)
Google Scholar
Matsuo, Y., Ishizuka, M.: Keyword Extraction from a Single Document using Word Cooccurrence Statistical Information. In: Proceedings of the 16th International FLAIRS Conference, St. Augustine, Floridam (2003)
Google Scholar

Download references

Author information

Authors and Affiliations

School of Computer Science, Beijing University of Posts and Telecommunications, 100876, Beijing, China
Jianyi Liu, Cong Wang, Zhengyang Liu & Wenbin Yao

Authors

Jianyi Liu
View author publications
You can also search for this author in PubMed Google Scholar
Cong Wang
View author publications
You can also search for this author in PubMed Google Scholar
Zhengyang Liu
View author publications
You can also search for this author in PubMed Google Scholar
Wenbin Yao
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Editors and Affiliations

Department of Business Administration, Caritas Francis Hsu College, 18 Chui Ling Road, Tseung Kwan O, Hong Kong, China
Fu Lee Wang
Department of Computer and Inforamtion Science, University of Macau, Av. Padre Tomás Pereira, Taipa, Macau, SAR, China
Zhiguo Gong
School of Computer, Shanghai University, 99 Shangda Road, 200444, Shanghai, China
Xiangfeng Luo
School of Computer, Nanjing University of Posts and Telecommunications, 210003, Nanjing, China
Jingsheng Lei

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Wang, C., Liu, Z., Yao, W. (2010). Advertising Keywords Extraction from Web Pages. In: Wang, F.L., Gong, Z., Luo, X., Lei, J. (eds) Web Information Systems and Mining. WISM 2010. Lecture Notes in Computer Science, vol 6318. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-16515-3_42

Download citation

DOI: https://doi.org/10.1007/978-3-642-16515-3_42
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-16514-6
Online ISBN: 978-3-642-16515-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics