Abstract
Monitoring the online reputation of a company starts by retrieving all (fresh) information where the company is mentioned; and a major problem in this context is that company names are often ambiguous (apple may refer to the company, the fruit, the singer, etc.). The problem is particularly hard in microblogging, where there is little context to disambiguate: this was the task addressed in the WePS-3 CLEF lab exercise in 2010. This paper introduces a novel fingerprint representation technique to visualize and compare system results for the task. We apply this technique to the systems that originally participated in WePS-3, and then we use it to explore the usefulness of filter keywords (those whose presence in a tweet reliably signals either the positive or the negative class) and finding the majority class (whether positive or negative tweets are predominant for a given company name in a tweet stream) as signals that contribute to address the problem. Our study shows that both are key signals to solve the task, and we also find that, remarkably, the vocabulary associated to a company in the Web does not seem to match the vocabulary used in Twitter streams: even a manual extraction of filter keywords from web pages has substantially lower recall than an oracle selection of the best terms from the Twitter stream.
This research was partially supported by the Spanish Ministry of Education via a doctoral grant to the first author (AP2009-0507) and the Spanish Ministry of Science and Innovation (Holopedia Project, TIN2010-21128-C02).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Amigó, E., Artiles, J., Gonzalo, J., Spina, D., Liu, B., Corujo, A.: WePS-3 Evaluation Campaign: Overview of the Online Reputation Management Task. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Cilibrasi, R.L., Vitanyi, P.M.: The google similarity distance. IEEE Transactions on Knowledge and Data Engineering (2007)
García-Cumbreras, M.A., García-Vega, M., Martínez-Santiago, F., Peréa-Ortega, J.M.: SINAI at WePS-3: Online Reputation Management. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Kalmar, P.: Bootstrapping Websites for Classification of Organization Names on Twitter. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Mierswa, I., Wurst, M., Klinkenberg, R., Scholz, M., Euler, T.: YALE: Rapid prototyping for complex data mining tasks. In: SIGKDD 2006: Proceedings of the 12th International Conference on Knowledge Discovery and Data Mining (2006)
Tsagkias, M., Balog, K.: The University of Amsterdam at WePS3. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Yerva, S.R., Miklós, Z., Aberer, K.: It was easy when apples and blackberries were only fruits. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Yoshida, M., Matsushima, S., Ono, S., Sato, I., Nakagawa, H.: ITC-UT: Tweet Categorization by Query Categorization for On-line Reputation Management. In: CLEF 2010 Labs and Workshops Notebook Papers (2010)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2011 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Spina, D., Amigó, E., Gonzalo, J. (2011). Filter Keywords and Majority Class Strategies for Company Name Disambiguation in Twitter. In: Forner, P., Gonzalo, J., Kekäläinen, J., Lalmas, M., de Rijke, M. (eds) Multilingual and Multimodal Information Access Evaluation. CLEF 2011. Lecture Notes in Computer Science, vol 6941. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-23708-9_7
Download citation
DOI: https://doi.org/10.1007/978-3-642-23708-9_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-642-23707-2
Online ISBN: 978-3-642-23708-9
eBook Packages: Computer ScienceComputer Science (R0)