Abstract
We present the results of our investigation on the use of predicate-argument structures for contextual opinion retrieval. The use of predicate-argument structure for opinion retrieval is a novel approach that exploits the grammatical derivation of sentences to show contextual and subjective relevance. We do not use frequency of certain keywords as it is usually done in keyword-based opinion retrieval approaches. Rather, our novel solution is based on frequency of contextually relevant and subjective sentences. We use a linear relevance model that leverages semantic similarities among predicate-argument structures of sentences. Thus, this paper presents the evaluation results of the linear relevance model. The model does a linear combination of a popular relevance model, our proposed transformed terms similarity model, and the absolute value of a sentence subjectivity scoring scheme. The predicate-argument structures are derived from the grammatical derivations of natural language query topics and the well formed sentences from blog documents. The derived predicate-argument structures are then semantically compared to compute an opinion relevance score. Our scoring technique uses the highest frequency of semantically related predicate-argument structures enriched with the total subjectivity score from sentences. Evaluation and experimental results show that predicate-argument structures can indeed be used for contextual opinion retrieval as it improves performance of opinion retrieval task by 15% over the popular TREC baselines.
Similar content being viewed by others
References
Agarwal, N., Liu, H.: Blogosphere: research issues, tools, and applications. SIGKDD Explor. Newsl. 10(1), 18–31 (2008)
Akaike, H.: Likelihood of a model and information criteria. Econometrics 16, 3–14 (1981)
Akaike, H.: Factor analysis and AIC. Psychometrika 52(3), 317–332 (1987)
Amati, G., Rijsbergen, C.J.V.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)
Amati, G., Amodeo, G., Bianchi, M., Gaibisso, C., Gambosi, G.: A Uniform Theoretic Approach to Opinion and Information Retrieval. In: Armano, G., de Gemmis, M., Semeraro, G., Vargiu, E. (eds.) Intelligent Information Access, vol. 301. Studies in Computational Intelligence, pp. 83-108. Springer Berlin/Heidelberg, (2010)
Bermingham, A., Smeaton, A.F.: A study of inter-annotator agreement for opinion retrieval. In: Proc. of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Boston, MA, USA (2009)
Boiy, E., Moens, M.-F.: A machine learning approach to sentiment analysis in multilingual Web texts. Inf. Retriev. 12(5), 526–558 (2009)
Bozdogan, H.: Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extentions. Psychometrika 52(3), 345–370 (1987)
Burnham, K.P., Anderson, D.R: Model Selection and Multimodel Inference. Springer-Verlag New York, Inc. (2002)
Charniak, E.: A maximum-entropy-inspired parser. In: Proc. of the 1st North American chapter of the Association for Computational Linguistics conference, Seattle, Washington (2000)
Charniak, E.: Top-down nearly-context-sensitive parsing. In: Proc. of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, Massachusetts (2010)
Clark, S., Curran, J.R.: Wide-coverage efficient statistical parsing with ccg and log-linear models. Comput. Linguist. 33(4), 493–552 (2007)
Curran, J.R., Clark, S., Bos, J.: Linguistically motivated large-scale NLP with C & C and boxer. In: Proc. of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, Czech Republic (2007)
Ding, X., Liu, B.: The utility of linguistic rules in opinion mining. In: Proc. of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, The Netherlands (2007)
Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: Proc. of the international conference on Web search and web data mining, Palo Alto, California, USA (2008)
Du, W., Tan, S.: An iterative reinforcement approach for fine-grained opinion mining. In: Proc. of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado (2009)
Duan, H., Hsu, B.-J.: Online spelling correction for query completion. In: Proc. of the 20th international conference on World Wide Web, Hyderabad, India (2011)
Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S., Harshman, R.: Using latent semantic analysis to improve access to textual information. In: Proc. of the SIGCHI conference on Human factors in computing systems, Washington, D.C., USA, (1988)
Esuli, A.: Automatic generation of lexical resources for opinion mining: models, algorithms and applications. SIGIR Forum 42(2), 105–106 (2008)
Fernández, R.T., Losada, D.E, Azzopardi, L.A: Extending the language modeling framework for sentence retrieval to include local context. Information Retrieval, 1-35 (2010)
Gerani, S., Carman, M.J., Crestani, F.: Proximity-Based Opinion Retrieval. SIGIR ACM,Geneva, Switzerland, 978 (2010)
Gildea, D., Hockenmaier, J.: Identifying semantic roles using Combinatory Categorial Grammar. In: Proc. of the 2003 conference on Empirical methods in natural language processing, Sapporo, Japan (2003)
He, B., Macdonald, C., Ounis, I.: Ranking opinionated blog posts using OpinionFinder. In: Proc. of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore (2008)
Hiemstra, D.: Using language models for information retrieval. Centre for Telematics and Information Technology, The Netherlands (2000)
Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, USA (1999)
Huang, X., Croft, W.B.: A unified relevance model for opinion retrieval. In: Proc. of the 18th ACM conference on Information and knowledge management, Hong Kong, China (2009)
Huang, J., Efthimiadis, E.N.: Analyzing and evaluating query reformulation strategies in web search logs. In: Proc. of the 18th ACM conference on Information and knowledge management, Hong Kong, China (2009)
Javanmardi, S., Gao, J., Wang, K.: Optimizing two stage bigram language models for IR. In: Proc. of the 19th international conference on World Wide Web, Raleigh, North Carolina, USA (2010)
Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: Proc. of the 15th international conference on World Wide Web, Edinburgh, Scotland (2006)
Kanayama, H., Nasukawa, T.: Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proc. of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia (2006)
Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proc. of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, USA (2001)
Lee Y, Jung H-y, Song W, Lee J-H. Mining the blogosphere for top news stories identification. In: Proc. of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland; 2010.
Lee, S.-W., Lee, J.-T., Song, Y.-I., Rim, H.-C.: High precision opinion retrieval using sentiment-relevance flows. In: Proc. of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland (2010)
Leung, C., Chan, S., Chung, F-l, Ngai, G.: A probabilistic rating inference framework for mining user preferences from reviews. World Wide Web 14(2), 187–215 (2011)
Liu, B.: Sentiment analysis and subjectivity. Handbook of Natural Language Processing, Second Edition (2010)
Lv, Y., Zhai, C.: A comparative study of methods for estimating query language models with pseudo feedback. In: Proc. of the 18th ACM conference on Information and knowledge management, Hong Kong, China (2009)
Macdonald, C., Santos, R.L.T., Ounis, I., Soboroff, I.: Blog track research at TREC. SIGIR Forum 44(1), 58–75 (2010)
Mukherjee, S., Ramakrishnan, I.V.: Automated semantic analysis of schematic data. World Wide Web 11(4), 427–464 (2008)
Müller, C., Gurevych, I.: Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) Evaluating Systems for Multilingual and Multimodal Information Access, vol. 5706. Lecture Notes in Computer Science, pp. 219-226. Springer Berlin/Heidelberg (2009)
Munson, S.A., Resnick, P.: Presenting diverse political opinions: how and how much. In: Proc. of the 28th international conference on Human factors in computing systems, Atlanta, Georgia, USA (2010)
Nam, S.-H., Na, S.-H., Lee, Y., Lee, J.-H.: DiffPost: Filtering Non-relevant Content Based on Content Difference between Two Consecutive Blog Posts. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) Advances in Information Retrieval, vol. 5478. Lecture Notes in Computer Science, pp. 791-795. Springer Berlin/Heidelberg (2009)
Natalie, S.G., Matthew, H., Takashi, T.: BlogPulse: Automated Trend Discovery for Weblogs. In. WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation (2004)
Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proc. of ACM SIGIR'06 Workshop on Open Source Information Retrieval (OSIR 2006), Seattle, Washington, USA (2006)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proc. of the ACL-02 conference on Empirical methods in natural language processing, Philadelphia, USA (2002)
Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)
Rijsbergen, C.J.V.: A Theoretical Basis for the use of Co-Occurrence Data in Information Retrieval. J. Doc. 33(2), 106–119 (1977)
Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends in Inf. Retriev. 3(4), 333–389 (2009)
Santos, R.L.T, He, B., Macdonald, C., Ounis, I.: Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval. ECIR Advances in Information Retrieval 5478/2009, 325-336 (2009)
Sarmento, S., Carvalho, P., Silva, M.-J., Eugénio de Oliveira: Automatic creation of a reference corpus for political opinion mining in user-generated content. In: Proc. of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion, Hong Kong, China (2009)
Siersdorfer, S.,Chelaru, S., Pedro, J.-S: How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings. In: Proc. of the 19th International World Wide Web Conference, Raleigh, North Carolina, USA, 891-900 (2010)
Steedman, M.: The Syntactic Process (Language, Speech, and Communication). The MIT Press (2000)
Surdeanu, M., Harabagiu, S., Williams, J., Aarseth, P.: Using predicate-argument structures for information extraction. In: Proc. of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan (2003)
Tata, S., Patel, J.M.: Estimating the selectivity of < i > tf-idf</i > based cosine similarity predicates. SIGMOD Rec. 36(4), 75–80 (2007)
Thet, T.T., Na, J.-C., Khoo, C.S.G., Shakthikumar, S.: Sentiment analysis of movie reviews on discussion boards using a linguistic approach. In: Proc. of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion, Hong Kong, China (2009)
Tumasjan, A., Sprenger, T.-O., Sandner, P.-J., Welpe, I.-M.: Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In: Proc. of the Fourth International AAAI Conference on Weblogs and Social Media (2010)
Wei, Z., Clement, Y.: UIC at TREC 2006 Blog Track. In: TREC (ed.). (2006)
Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Res. and Eval. 39(2/3), 165–210 (2005)
Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: a system for subjectivity analysis. In: Proc. of HLT/EMNLP on Interactive Demonstrations, Vancouver, British Columbia, Canada (2005)
Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Comput. Linguist. 35(3), 399–433 (2009)
Xu, X., Liu, Y., Xu, H., Yu, X., Song, L., Guan, F., Peng, Z., Cheng, X.: ICTNET at Blog Track TREC 2009. TREC 2009 (2009)
Zafarani, R., Cole, W., Liu, H.: Sentiment propagation in social networks: a case study in livejournal. In: Chai, S.-K., Salerno, J., Mabry, P. (eds.) Advances in Social Computing, vol. 6007. Lecture Notes in Computer Science, pp. 413–420. Springer, Berlin (2010)
Zhai, C.: Statistical language models for information retrieval a critical review. Foundations and Trends in Inf. Retriev. 2(3), 137–213 (2008)
Zhang, W., Yu, C., Meng, W.: Opinion retrieval from blogs. In: Proc. of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal (2007)
Zhang, R., Tran, T., Mao, Y.: Opinion helpfulness prediction in the presence of “words of few mouths”. World Wide Web, 1-22 (2011)
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Orimaye, S.O., Alhashmi, S.M. & Siew, EG. Can predicate-argument structures be used for contextual opinion retrieval from blogs?. World Wide Web 16, 763–791 (2013). https://doi.org/10.1007/s11280-012-0170-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11280-012-0170-8