Skip to main content

Knowledge-Based Metrics for Document Classification: Online Reviews Experiments

  • Conference paper
  • First Online:
  • 652 Accesses

Part of the book series: Studies in Computational Intelligence ((SCI,volume 798))

Abstract

In this paper we propose a new method that addresses the documents classification problem with respect to their topic. The presented method takes into consideration only textual measures. We exemplify the method by considering three sets of documents of gradually different topics: (i) the first two sets contain reviews that comment the published entity features characteristics representing electronic devices – laptops and mobile phones; (ii) the third set contains reviews about touristic locations. All the review texts are written in Romanian and were extracted by crawling popular Romanian sites. The paper presents and discusses the obtained evaluation scores after the application of textual measures.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   169.00
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   219.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info
Hardcover Book
USD   219.99
Price excludes VAT (USA)
  • Durable hardcover edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Notes

  1. 1.

    http://www.laurenceanthony.net/.

  2. 2.

    “Research shows that 91% of people regularly or occasionally read online reviews, and 84% trust online reviews as much as a personal recommendation.” says Craig Bloem, Founder and CEO at FreeLogoServices.com.

  3. 3.

    http://amfostacolo.ro.

  4. 4.

    Because the activity domain of these sites is in business we chose not to mention the sites’ names and also to anonymize all the data that could have a potential commercial characteristic.

References

  1. Balahur, P., Balahur, A.: What does the world think about you? Opinion mining and sentiment analysis in the social web. The Scientific Annals of “Alexandru Ioan Cuza” University of Iaşi Communication Science (2015). ISSN 2068-1143

    Google Scholar 

  2. Becheru, A., Bădică, C.: A deeper perspective of online tourism reviews analysis using natural language processing and complex networks techniques. In: Proceedings of the 12th International Conference Linguistic Resources and Tools for Processing the Romanian Language, ConsILR 2016, pp. 189–192 (2016)

    Google Scholar 

  3. Becheru, A., Buşe, F., Colhon, M., Bădică, C.: Tourist review analytics using complex networks. In: Proceedings of the 7th Balkan Conference on Informatics Conference, BCI 2015, pp. 25:1–25:8 (2015)

    Google Scholar 

  4. Becheru, A., Bădică, C., Antonie, M.: Towards social data analytics for smart tourism: a network science perspective. In: Trandabăţ, D., Gîfu, D. (eds.) Linguistic Linked Open Data, RUMOUR 2015. Communications in Computer and Information Science, vol. 588, pp. 35–48. Springer, Cham (2016)

    Google Scholar 

  5. Bonferroni, C.E.: Il calcolo delle assicurazioni su gruppi di teste. In: Studi in Onore del Professore Salvatore Ortu Carboni, Rome, Italy, pp. 13–60 (1935)

    Google Scholar 

  6. Boser, B.E., Guyon, I.M., Vapnik, V.N.: A training algorithm for optimal margin classifiers. In: Proceedings of the Fifth Annual Workshop on Computational Learning Theory - COLT 1992, p. 144 (1992). https://doi.org/10.1145/130385.130401

  7. Chai, K.M.A., Chieu, H.L., Ng, H.T.: Bayesian online classifiers for text classification and filtering. In: Proceedings of the 25th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2002), pp. 97–104. ACM, New York (2002)

    Google Scholar 

  8. Colhon, M., Bădică, C.: Users classification in an online community of Romanian tourists. In: Joint Proceedings of the 1st Workshop on Temporal Dynamics in Digital Libraries (TDDL 2017), the (Meta)-Data Quality Workshop (MDQual 2017) and the Workshop on Modeling Societal Future (Futurity 2017) co-located with 21st International Conference on Theory and Practice of Digital Libraries (TPLD 2017), Thessaloniki, Greece, paper 8 (2017)

    Google Scholar 

  9. Colhon, M., Bădică, C., Şendre, A.: Relating the opinion holder and the review accuracy in sentiment analysis of tourist reviews. In: Proceedings of 7th International Conference on Knowledge Science, Engineering and Management, KSEM 2014, pp. 246–257 (2014)

    Google Scholar 

  10. Do, C., Ng, A.: Transfer learning for text classification. In: Proceedings of Neural Information Processing Systems (NIPS) (2005)

    Google Scholar 

  11. Feng, G., Guo, J., Jing, B.-Y., Hao, L.: A Bayesian feature selection paradigm for text classification. Inf. Process. Manag. 48(2), 283–302 (2012)

    Article  Google Scholar 

  12. Friedl, M.A., Brodley, C.E.: Decision tree classification of land cover from remotely sensed data. Remote Sens. Environ. 61(3), 399–409 (1997)

    Article  Google Scholar 

  13. Evgeniy, G., Markovitch, S.: Harnessing the expertise of 70,000 human editors: knowledge-based feature generation for text categorization. J. Mach. Learn. Res. 8, 2297–2345 (2007)

    Google Scholar 

  14. Gîfu, D., Sălăvăstru, A.: A study of geographic data annotation. In: Proceedings of the Summer School on Linguistic Linked Open Data, EUROLAN-2015, Sibiu, Romania (2015)

    Google Scholar 

  15. Goyal, R.D.: Knowledge based neural network for text classification. In: Proceedings of the IEEE International Conference on Granular Computing, pp. 542–547 (2007). https://doi.org/10.1109/GrC.2007.108

  16. Groom, N.: Closed-class keywords and corpus-driven discourse analysis. In: Bondi, M., Scott, M. (eds.) Keyness in Texts, pp. 59–78 (2010)

    Google Scholar 

  17. Huang, A.: Similarity measures for text document clustering. In: Proceedings of the New Zealand Computer Science Research Student. Conference (NZCSRSC 2008), Christchurch, New Zealand (2008)

    Google Scholar 

  18. Jingbo, Z., Yao, T.: A knowledge-based approach to text classification. In: Proceedings of the first SIGHAN Workshop on Chinese Language Processing (SIGHAN 2002), vol. 18, pp. 1–5. Association for Computational Linguistics, Stroudsburg (2002). https://doi.org/10.3115/1118824.1118844

  19. Lenat, D., Feigenbaum, E.: On the thresholds of knowledge. Artif. Intell. 47, 185–250 (1990)

    Article  MathSciNet  Google Scholar 

  20. McCallum, A.K.: Bow: a toolkit for statistical language modeling, text retrieval, classification and clustering (1996)

    Google Scholar 

  21. Ng, V., Dasgupta, S., Arifin, S.M.N.: Examining the role of linguistic knowledge sources in the automatic identification and classification of reviews. In: Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the Association for Computational Linguistics, pp. 611–618 (2006)

    Google Scholar 

  22. Raina, R., Ng, A., Koller, D.: Constructing informative priors using transfer learning. In: Proceedings of the 23rd International Conference on Machine Learning (ICML), Pittsburgh, PA (2006)

    Google Scholar 

  23. Scott, M.: Problems in investigating keyness, or clearing the undergrowth and marking out trails. In: Bondi, M., Scott, M. (eds.) Keyness in Texts, pp. 43–57. John Benjamins, Amsterdam (2010)

    Chapter  Google Scholar 

  24. Scott, M., Tribble, C.: Textual Patterns: Key Words and Corpus Analysis In Language Education. John Benjamins, Philadelphia (2006)

    Book  Google Scholar 

  25. Weka 3: Data Mining with Open Source Machine Learning Software in Java, Machine Learning Group at the University of Waikato (2017)

    Google Scholar 

  26. Yadollahi, A., Shahraki, A.G., Zaiane, O.R.: Current state of text sentiment analysis from opinion to emotion mining. ACM Comput. Surv. 50(2), Article 25 (2017)

    Article  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mihaela Colhon .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2018 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Colhon, M., Bădică, C. (2018). Knowledge-Based Metrics for Document Classification: Online Reviews Experiments. In: Del Ser, J., Osaba, E., Bilbao, M., Sanchez-Medina, J., Vecchio, M., Yang, XS. (eds) Intelligent Distributed Computing XII. IDC 2018. Studies in Computational Intelligence, vol 798. Springer, Cham. https://doi.org/10.1007/978-3-319-99626-4_37

Download citation

  • DOI: https://doi.org/10.1007/978-3-319-99626-4_37

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-319-99625-7

  • Online ISBN: 978-3-319-99626-4

  • eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics