Skip to main content
Log in

Can predicate-argument structures be used for contextual opinion retrieval from blogs?

  • Published:
World Wide Web Aims and scope Submit manuscript

Abstract

We present the results of our investigation on the use of predicate-argument structures for contextual opinion retrieval. The use of predicate-argument structure for opinion retrieval is a novel approach that exploits the grammatical derivation of sentences to show contextual and subjective relevance. We do not use frequency of certain keywords as it is usually done in keyword-based opinion retrieval approaches. Rather, our novel solution is based on frequency of contextually relevant and subjective sentences. We use a linear relevance model that leverages semantic similarities among predicate-argument structures of sentences. Thus, this paper presents the evaluation results of the linear relevance model. The model does a linear combination of a popular relevance model, our proposed transformed terms similarity model, and the absolute value of a sentence subjectivity scoring scheme. The predicate-argument structures are derived from the grammatical derivations of natural language query topics and the well formed sentences from blog documents. The derived predicate-argument structures are then semantically compared to compute an opinion relevance score. Our scoring technique uses the highest frequency of semantically related predicate-argument structures enriched with the total subjectivity score from sentences. Evaluation and experimental results show that predicate-argument structures can indeed be used for contextual opinion retrieval as it improves performance of opinion retrieval task by 15% over the popular TREC baselines.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Subscribe and save

Springer+ Basic
$34.99 /Month
  • Get 10 units per month
  • Download Article/Chapter or eBook
  • 1 Unit = 1 Article or 1 Chapter
  • Cancel anytime
Subscribe now

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Agarwal, N., Liu, H.: Blogosphere: research issues, tools, and applications. SIGKDD Explor. Newsl. 10(1), 18–31 (2008)

    Article  Google Scholar 

  2. Akaike, H.: Likelihood of a model and information criteria. Econometrics 16, 3–14 (1981)

    Article  MATH  Google Scholar 

  3. Akaike, H.: Factor analysis and AIC. Psychometrika 52(3), 317–332 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  4. Amati, G., Rijsbergen, C.J.V.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst. 20(4), 357–389 (2002)

    Article  Google Scholar 

  5. Amati, G., Amodeo, G., Bianchi, M., Gaibisso, C., Gambosi, G.: A Uniform Theoretic Approach to Opinion and Information Retrieval. In: Armano, G., de Gemmis, M., Semeraro, G., Vargiu, E. (eds.) Intelligent Information Access, vol. 301. Studies in Computational Intelligence, pp. 83-108. Springer Berlin/Heidelberg, (2010)

  6. Bermingham, A., Smeaton, A.F.: A study of inter-annotator agreement for opinion retrieval. In: Proc. of the 32nd international ACM SIGIR conference on Research and development in information retrieval, Boston, MA, USA (2009)

  7. Boiy, E., Moens, M.-F.: A machine learning approach to sentiment analysis in multilingual Web texts. Inf. Retriev. 12(5), 526–558 (2009)

    Article  Google Scholar 

  8. Bozdogan, H.: Model selection and Akaike’s Information Criterion (AIC): the general theory and its analytical extentions. Psychometrika 52(3), 345–370 (1987)

    Article  MathSciNet  MATH  Google Scholar 

  9. Burnham, K.P., Anderson, D.R: Model Selection and Multimodel Inference. Springer-Verlag New York, Inc. (2002)

  10. Charniak, E.: A maximum-entropy-inspired parser. In: Proc. of the 1st North American chapter of the Association for Computational Linguistics conference, Seattle, Washington (2000)

  11. Charniak, E.: Top-down nearly-context-sensitive parsing. In: Proc. of the 2010 Conference on Empirical Methods in Natural Language Processing, Cambridge, Massachusetts (2010)

  12. Clark, S., Curran, J.R.: Wide-coverage efficient statistical parsing with ccg and log-linear models. Comput. Linguist. 33(4), 493–552 (2007)

    Article  MATH  Google Scholar 

  13. Curran, J.R., Clark, S., Bos, J.: Linguistically motivated large-scale NLP with C & C and boxer. In: Proc. of the 45th Annual Meeting of the ACL on Interactive Poster and Demonstration Sessions, Prague, Czech Republic (2007)

  14. Ding, X., Liu, B.: The utility of linguistic rules in opinion mining. In: Proc. of the 30th annual international ACM SIGIR conference on Research and development in information retrieval, Amsterdam, The Netherlands (2007)

  15. Ding, X., Liu, B., Yu, P.S.: A holistic lexicon-based approach to opinion mining. In: Proc. of the international conference on Web search and web data mining, Palo Alto, California, USA (2008)

  16. Du, W., Tan, S.: An iterative reinforcement approach for fine-grained opinion mining. In: Proc. of Human Language Technologies: The 2009 Annual Conference of the North American Chapter of the Association for Computational Linguistics, Boulder, Colorado (2009)

  17. Duan, H., Hsu, B.-J.: Online spelling correction for query completion. In: Proc. of the 20th international conference on World Wide Web, Hyderabad, India (2011)

  18. Dumais, S.T., Furnas, G.W., Landauer, T.K., Deerwester, S., Harshman, R.: Using latent semantic analysis to improve access to textual information. In: Proc. of the SIGCHI conference on Human factors in computing systems, Washington, D.C., USA, (1988)

  19. Esuli, A.: Automatic generation of lexical resources for opinion mining: models, algorithms and applications. SIGIR Forum 42(2), 105–106 (2008)

    Article  Google Scholar 

  20. Fernández, R.T., Losada, D.E, Azzopardi, L.A: Extending the language modeling framework for sentence retrieval to include local context. Information Retrieval, 1-35 (2010)

  21. Gerani, S., Carman, M.J., Crestani, F.: Proximity-Based Opinion Retrieval. SIGIR ACM,Geneva, Switzerland, 978 (2010)

  22. Gildea, D., Hockenmaier, J.: Identifying semantic roles using Combinatory Categorial Grammar. In: Proc. of the 2003 conference on Empirical methods in natural language processing, Sapporo, Japan (2003)

  23. He, B., Macdonald, C., Ounis, I.: Ranking opinionated blog posts using OpinionFinder. In: Proc. of the 31st annual international ACM SIGIR conference on Research and development in information retrieval, Singapore (2008)

  24. Hiemstra, D.: Using language models for information retrieval. Centre for Telematics and Information Technology, The Netherlands (2000)

    Google Scholar 

  25. Hofmann, T.: Probabilistic latent semantic indexing. In: Proc. of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval, Berkeley, California, USA (1999)

  26. Huang, X., Croft, W.B.: A unified relevance model for opinion retrieval. In: Proc. of the 18th ACM conference on Information and knowledge management, Hong Kong, China (2009)

  27. Huang, J., Efthimiadis, E.N.: Analyzing and evaluating query reformulation strategies in web search logs. In: Proc. of the 18th ACM conference on Information and knowledge management, Hong Kong, China (2009)

  28. Javanmardi, S., Gao, J., Wang, K.: Optimizing two stage bigram language models for IR. In: Proc. of the 19th international conference on World Wide Web, Raleigh, North Carolina, USA (2010)

  29. Jones, R., Rey, B., Madani, O., Greiner, W.: Generating query substitutions. In: Proc. of the 15th international conference on World Wide Web, Edinburgh, Scotland (2006)

  30. Kanayama, H., Nasukawa, T.: Fully automatic lexicon expansion for domain-oriented sentiment analysis. In: Proc. of the 2006 Conference on Empirical Methods in Natural Language Processing, Sydney, Australia (2006)

  31. Lavrenko, V., Croft, W.B.: Relevance based language models. In: Proc. of the 24th annual international ACM SIGIR conference on Research and development in information retrieval, New Orleans, Louisiana, USA (2001)

  32. Lee Y, Jung H-y, Song W, Lee J-H. Mining the blogosphere for top news stories identification. In: Proc. of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland; 2010.

  33. Lee, S.-W., Lee, J.-T., Song, Y.-I., Rim, H.-C.: High precision opinion retrieval using sentiment-relevance flows. In: Proc. of the 33rd international ACM SIGIR conference on Research and development in information retrieval, Geneva, Switzerland (2010)

  34. Leung, C., Chan, S., Chung, F-l, Ngai, G.: A probabilistic rating inference framework for mining user preferences from reviews. World Wide Web 14(2), 187–215 (2011)

    Article  Google Scholar 

  35. Liu, B.: Sentiment analysis and subjectivity. Handbook of Natural Language Processing, Second Edition (2010)

  36. Lv, Y., Zhai, C.: A comparative study of methods for estimating query language models with pseudo feedback. In: Proc. of the 18th ACM conference on Information and knowledge management, Hong Kong, China (2009)

  37. Macdonald, C., Santos, R.L.T., Ounis, I., Soboroff, I.: Blog track research at TREC. SIGIR Forum 44(1), 58–75 (2010)

    Article  Google Scholar 

  38. Mukherjee, S., Ramakrishnan, I.V.: Automated semantic analysis of schematic data. World Wide Web 11(4), 427–464 (2008)

    Article  Google Scholar 

  39. Müller, C., Gurevych, I.: Using Wikipedia and Wiktionary in Domain-Specific Information Retrieval. In: Peters, C., Deselaers, T., Ferro, N., Gonzalo, J., Jones, G., Kurimo, M., Mandl, T., Peñas, A., Petras, V. (eds.) Evaluating Systems for Multilingual and Multimodal Information Access, vol. 5706. Lecture Notes in Computer Science, pp. 219-226. Springer Berlin/Heidelberg (2009)

  40. Munson, S.A., Resnick, P.: Presenting diverse political opinions: how and how much. In: Proc. of the 28th international conference on Human factors in computing systems, Atlanta, Georgia, USA (2010)

  41. Nam, S.-H., Na, S.-H., Lee, Y., Lee, J.-H.: DiffPost: Filtering Non-relevant Content Based on Content Difference between Two Consecutive Blog Posts. In: Boughanem, M., Berrut, C., Mothe, J., Soule-Dupuy, C. (eds.) Advances in Information Retrieval, vol. 5478. Lecture Notes in Computer Science, pp. 791-795. Springer Berlin/Heidelberg (2009)

  42. Natalie, S.G., Matthew, H., Takashi, T.: BlogPulse: Automated Trend Discovery for Weblogs. In. WWW 2004 Workshop on the Weblogging Ecosystem: Aggregation (2004)

  43. Ounis, I., Amati, G., Plachouras, V., He, B., Macdonald, C., Lioma, C.: Terrier: A High Performance and Scalable Information Retrieval Platform. In: Proc. of ACM SIGIR'06 Workshop on Open Source Information Retrieval (OSIR 2006), Seattle, Washington, USA (2006)

  44. Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proc. of the ACL-02 conference on Empirical methods in natural language processing, Philadelphia, USA (2002)

  45. Pang, B., Lee, L.: Opinion Mining and Sentiment Analysis. Found. Trends Inf. Retr. 2(1–2), 1–135 (2008)

    Article  Google Scholar 

  46. Rijsbergen, C.J.V.: A Theoretical Basis for the use of Co-Occurrence Data in Information Retrieval. J. Doc. 33(2), 106–119 (1977)

    Article  Google Scholar 

  47. Robertson, S., Zaragoza, H.: The probabilistic relevance framework: BM25 and beyond. Found. Trends in Inf. Retriev. 3(4), 333–389 (2009)

    Article  Google Scholar 

  48. Santos, R.L.T, He, B., Macdonald, C., Ounis, I.: Integrating Proximity to Subjective Sentences for Blog Opinion Retrieval. ECIR Advances in Information Retrieval 5478/2009, 325-336 (2009)

  49. Sarmento, S., Carvalho, P., Silva, M.-J., Eugénio de Oliveira: Automatic creation of a reference corpus for political opinion mining in user-generated content. In: Proc. of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion, Hong Kong, China (2009)

  50. Siersdorfer, S.,Chelaru, S., Pedro, J.-S: How Useful are Your Comments? Analyzing and Predicting YouTube Comments and Comment Ratings. In: Proc. of the 19th International World Wide Web Conference, Raleigh, North Carolina, USA, 891-900 (2010)

  51. Steedman, M.: The Syntactic Process (Language, Speech, and Communication). The MIT Press (2000)

  52. Surdeanu, M., Harabagiu, S., Williams, J., Aarseth, P.: Using predicate-argument structures for information extraction. In: Proc. of the 41st Annual Meeting on Association for Computational Linguistics, Sapporo, Japan (2003)

  53. Tata, S., Patel, J.M.: Estimating the selectivity of < i > tf-idf</i > based cosine similarity predicates. SIGMOD Rec. 36(4), 75–80 (2007)

    Article  Google Scholar 

  54. Thet, T.T., Na, J.-C., Khoo, C.S.G., Shakthikumar, S.: Sentiment analysis of movie reviews on discussion boards using a linguistic approach. In: Proc. of the 1st international CIKM workshop on Topic-sentiment analysis for mass opinion, Hong Kong, China (2009)

  55. Tumasjan, A., Sprenger, T.-O., Sandner, P.-J., Welpe, I.-M.: Predicting Elections with Twitter: What 140 Characters Reveal about Political Sentiment. In: Proc. of the Fourth International AAAI Conference on Weblogs and Social Media (2010)

  56. Wei, Z., Clement, Y.: UIC at TREC 2006 Blog Track. In: TREC (ed.). (2006)

  57. Wiebe, J., Wilson, T., Cardie, C.: Annotating expressions of opinions and emotions in language. Lang. Res. and Eval. 39(2/3), 165–210 (2005)

    Article  Google Scholar 

  58. Wilson, T., Hoffmann, P., Somasundaran, S., Kessler, J., Wiebe, J., Choi, Y., Cardie, C., Riloff, E., Patwardhan, S.: OpinionFinder: a system for subjectivity analysis. In: Proc. of HLT/EMNLP on Interactive Demonstrations, Vancouver, British Columbia, Canada (2005)

  59. Wilson, T., Wiebe, J., Hoffmann, P.: Recognizing contextual polarity: An exploration of features for phrase-level sentiment analysis. Comput. Linguist. 35(3), 399–433 (2009)

    Article  Google Scholar 

  60. Xu, X., Liu, Y., Xu, H., Yu, X., Song, L., Guan, F., Peng, Z., Cheng, X.: ICTNET at Blog Track TREC 2009. TREC 2009 (2009)

  61. Zafarani, R., Cole, W., Liu, H.: Sentiment propagation in social networks: a case study in livejournal. In: Chai, S.-K., Salerno, J., Mabry, P. (eds.) Advances in Social Computing, vol. 6007. Lecture Notes in Computer Science, pp. 413–420. Springer, Berlin (2010)

    Google Scholar 

  62. Zhai, C.: Statistical language models for information retrieval a critical review. Foundations and Trends in Inf. Retriev. 2(3), 137–213 (2008)

    Article  Google Scholar 

  63. Zhang, W., Yu, C., Meng, W.: Opinion retrieval from blogs. In: Proc. of the sixteenth ACM conference on Conference on information and knowledge management, Lisbon, Portugal (2007)

  64. Zhang, R., Tran, T., Mao, Y.: Opinion helpfulness prediction in the presence of “words of few mouths”. World Wide Web, 1-22 (2011)

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Sylvester O. Orimaye.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Orimaye, S.O., Alhashmi, S.M. & Siew, EG. Can predicate-argument structures be used for contextual opinion retrieval from blogs?. World Wide Web 16, 763–791 (2013). https://doi.org/10.1007/s11280-012-0170-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11280-012-0170-8

Keywords

Navigation