A Multi-strategy Learning Approach to Competitor Identification

Ruan, Tong; Lin, Yeli; Wang, Haofen; Pan, Jeff Z.

doi:10.1007/978-3-319-15615-6_15

Tong Ruan¹⁸,
Yeli Lin¹⁸,
Haofen Wang¹⁸ &
…
Jeff Z. Pan¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 8943))

Included in the following conference series:

Joint International Semantic Technology Conference

1323 Accesses
3 Citations

Abstract

Competitor identification tries to find competitors of some entity in a given field, which is the key to the success of market intelligence. Manually collecting competitors is labor-intensive and time consuming. So automatic approaches are proposed for this purpose. However, these approaches suffer from the following two main challenges. Competitor information might not only be contained in semi-structured sources like lists or tables, but also be mentioned in free texts. The diversity of its sources make competitor identification quite difficult. Also, these competitors might not always occur in form of their full names. The occurrences of name variants further increase the diversity, and make the task more challenging. In this paper, we propose a novel unsupervised approach to identify competitors from prospectuses based on a multi-strategy learning algorithm. More precisely, we first extract competitors from lists using some predefined heuristic rules. By leveraging redundancies among competitor information in lists, tables, and texts, these competitors are fed as seeds to distantly supervise the learning process to find table columns and text patterns containing competitors. The whole process is iteratively performed. In each iteration, the newly discovered competitors of high confidence from various sources are treated as new seeds for bootstrapping. The experimental results show the effectiveness of our approach without human intentions and external knowledge bases. Moreover, the approach significantly outperforms traditional named entity recognition approaches.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Bootstrapping Yahoo! Finance by Wikipedia for Competitor Mining

A Hybrid Generative/Discriminative Model for Rapid Prototyping of Domain-Specific Named Entity Recognition

A Self-learning Rule-Based Approach for Sci-tech Compound Phrase Entity Recognition

References

Bao, S., Li, R., Yu, Y., Cao, Y.: Competitor mining with the web. IEEE Transactions on Knowledge and Data Engineering 20(10), 1297–1310 (2008)
Article Google Scholar
Lappas, T., Valkanas, G., Gunopulos, D.: Efficient and domain-invariant competitor mining. In: Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 408–416. ACM (2012)
Google Scholar
Ciravegna, F., Chapman, S., Dingli, A., Wilks, Y.: Learning to harvest information for the semantic web. In: Bussler, C.J., Davies, J., Fensel, D., Studer, R. (eds.) ESWS 2004. LNCS, vol. 3053, pp. 312–326. Springer, Heidelberg (2004)
Chapter Google Scholar
Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proceedings of the 17th ACM Conference on Information and Knowledge Management, pp. 509–518. ACM (2008)
Google Scholar
Limaye, G., Sarawagi, S., Chakrabarti, S.: Annotating and searching web tables using entities, types and relationships. Proceedings of the VLDB Endowment 3(1–2), 1338–1347 (2010)
Article Google Scholar
Agichtein, E., Gravano, L.: Snowball: extracting relations from large plain-text collections. In: Proceedings of the Fifth ACM Conference on Digital Libraries, pp. 85–94. ACM (2000)
Google Scholar
Banko, M., Cafarella, M.J., Soderland, S., Broadhead, M., Etzioni, O.: Open information extraction for the web. IJCAI 7, 2670–2676 (2007)
Google Scholar
Etzioni, O., Cafarella, M., Downey, D., Kok, S., Popescu, A.M., Shaked, T., Soderland, S., Weld, D.S., Yates, A.: Web-scale information extraction in knowitall: (preliminary results). In: Proceedings of the 13th International Conference on World Wide Web, pp. 100–110. ACM (2004)
Google Scholar
Ciravegna, F., Gentile, A.L., Zhang, Z.: Lodie: Linked open data for web-scale information extraction. SWAIE 925, 11–22 (2012)
Google Scholar
Hao, Q., Cai, R., Pang, Y., Zhang, L.: From one tree to a forest: a unified solution for structured web data extraction. In: Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 775–784. ACM (2011)
Google Scholar
Gulhane, P., Madaan, A., Mehta, R., Ramamirtham, J., Rastogi, R., Satpal, S., Sengamedu, S.H., Tengli, A., Tiwari, C.: Web-scale information extraction with vertex. In: IEEE 27th International Conference on Data Engineering (ICDE 2011), pp. 1209–1220. IEEE (2011)
Google Scholar
He, J., Gu, Y., Liu, H., Yan, J., Chen, H.: Scalable and noise tolerant web knowledge extraction for search task simplification. Decision Support Systems 56, 156–167 (2013)
Article Google Scholar
Dalvi, N., Kumar, R., Soliman, M.: Automatic wrappers for large scale web extraction. Proceedings of the VLDB Endowment 4(4), 219–230 (2011)
Article Google Scholar
Gentile, A.L., Zhang, Z., Ciravegna, F.: Web scale information extraction with lodie. In: 2013 AAAI Fall Symposium Series (2013)
Google Scholar
Mintz, M., Bills, S., Snow, R., Jurafsky, D.: Distant supervision for relation extraction without labeled data. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, vol. 2, pp. 1003–1011. Association for Computational Linguistics (2009)
Google Scholar
Roth, B., Barth, T., Wiegand, M., Singh, M., Klakow, D.: Effective slot filling based on shallow distant supervision methods. arXiv preprint arXiv:1401.1158 (2014)
Roth, B., Barth, T., Chrupała, G., Gropp, M., Klakow, D.: Relationfactory: a fast, modular and effective system for knowledge base population. In: EACL 2014, p. 89 (2014)
Google Scholar
Xue, C., Wang, H., Jin, B., Wang, M., Gao, D.: Effective chinese organization name linking to a list-like knowledge base. In: Zhao, D., Du, J., Wang, H., Wang, P., Ji, D., Pan, J.Z. (eds.) CSWS 2014. CCIS, vol. 480, pp. 97–110. Springer, Heidelberg (2014)
Chapter Google Scholar

Download references

Author information

Authors and Affiliations

East China University of Science and Technology, Shanghai, 200237, China
Tong Ruan, Yeli Lin & Haofen Wang
University of Aberdeen, Aberdeen, Scotland
Jeff Z. Pan

Authors

Tong Ruan
View author publications
You can also search for this author in PubMed Google Scholar
Yeli Lin
View author publications
You can also search for this author in PubMed Google Scholar
Haofen Wang
View author publications
You can also search for this author in PubMed Google Scholar
Jeff Z. Pan
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Tong Ruan .

Editor information

Editors and Affiliations

National Electronics and Computer Technology Center, Pathum Thani, Thailand
Thepchai Supnithi
Keio University Fac. of Science & Technology, Yokohama, Japan
Takahira Yamaguchi
Department of Computing Science, University of Aberdeen, Aberdeen, United Kingdom
Jeff Z. Pan
Asian University, Chonburi, Thailand
Vilas Wuwongse
National Electronics Computer Tech Ctr, Sci Park, Pathumthani, Thailand
Marut Buranarach

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ruan, T., Lin, Y., Wang, H., Pan, J.Z. (2015). A Multi-strategy Learning Approach to Competitor Identification. In: Supnithi, T., Yamaguchi, T., Pan, J., Wuwongse, V., Buranarach, M. (eds) Semantic Technology. JIST 2014. Lecture Notes in Computer Science(), vol 8943. Springer, Cham. https://doi.org/10.1007/978-3-319-15615-6_15

Download citation

DOI: https://doi.org/10.1007/978-3-319-15615-6_15
Published: 21 February 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-15614-9
Online ISBN: 978-3-319-15615-6
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics