Skip to main content
Log in

A novel framework for multiclass supervised classification of location-sensitive events

  • 1222: Intelligent Multimedia Data Analytics and Computing
  • Published:
Multimedia Tools and Applications Aims and scope Submit manuscript

Abstract

In the past couple of years, location-sensitive information retrieval has gained significant attention in terms of extracting and utilizing location information present in the unstructured text. It requires analysis of documents both geographically and thematically that makes it a challenging task. The semantics of text needs to be associated with location features present in the text. Such information association is beneficial in conducting fine-grained analysis of events reported in the text, e.g., Tourist location recommendation, Disaster surveillance, Political activeness and Happiness index, etc. Recently, context-based vector space models have attained much importance in text mining as they intelligently preserve semantics of the text while representing text in vector space of desired dimension. In this paper, a framework for multiclass supervised classification of location-sensitive events, namely, LDoc2Vec is proposed that integrates context-based vector space models with geographic scope resolution of events reported in the text documents. Variants of the Doc2Vec model have been integrated with location features and their performance for multiclass supervised event classification is analysed. Experimental results with various machine learning classifiers indicate that the proposed framework outperforms baseline Doc2Vec models for multiclass classification of location-sensitive events as expressed by renowned performance measurement metrics viz. precision, recall and F1-score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14

Similar content being viewed by others

References

  1. Ali D, Missen MMS, Husnain M (2021) Multiclass event classification from text. Sci Program 2021:1–15. https://doi.org/10.1155/2021/6660651

    Article  Google Scholar 

  2. Al-Rfou R, Kulkarni V, Perozzi B, Skiena S (2015) “POLYGLOT-NER: Massive Multilingual named entity recognition,” arXiv:1410.3791, [Online]. Available: https://arxiv.org/abs/1410.3791. Accessed 6 Feb 2021

  3. Amitay E, Har’El N, Sivan R, Soffer A (2004) “Web-a-where: geotagging web content,” In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 273–280. https://doi.org/10.1145/1008992.1009040

  4. Anastácio I, Martins B, Calado P (2009) “Classifying documents according to locational relevance,” In: Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, pp. 598–609. https://doi.org/10.1007/978-3-642-04686-5_49

  5. Andogah G, Bouma G, Nerbonne J (2012) Every document has a geographical scope. Data Knowl Eng 81–82:1–20. https://doi.org/10.1016/j.datak.2012.07.002

    Article  Google Scholar 

  6. Bendimerad A, Plantevit M, Robardet C, Amer-Yahia S (2021) User-driven Geolocated event detection in social media. IEEE Trans Knowl Data Eng 33(2):796–809. https://doi.org/10.1109/TKDE.2019.2931340

    Article  Google Scholar 

  7. Bijalwan V, Kumar V, Kumari P, Pascual J (2014) KNN based machine learning approach for text and document mining. Int J Database Theory Appl 7(1):61–70. https://doi.org/10.14257/ijdta.2014.7.1.06

    Article  Google Scholar 

  8. Bilgin M, Köktaş H (2019) Sentiment analysis with term weighting and word vectors. Int Arab J Inf Technol 16(5):953–959

    Google Scholar 

  9. Cao TH, Tang TM, Chau CK (2012) “Text Clustering with Named Entities: A Model, Experimentation and Realization,” In: Holmes DE, Jain LC (eds), Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol. 23, pp. 267–287. https://doi.org/10.1007/978-3-642-23166-7_10

  10. “Census of India, List of Towns” (2020) https://censusindia.gov.in/Tables_Published/Admin_Units/admin.html. Accessed 27 Dec 2020

  11. Cha M, Gwon Y, Kung HT (2017) “Language modeling by clustering with word embeddings for text readability assessment,” In: International Conference on Information and Knowledge Management, Proceedings, vol. Part F1318, pp. 2003–2006. https://doi.org/10.1145/3132847.3133104

  12. Choi D, Park S, Ham D, Lim H, Bok K, Yoo J (2021) Local event detection scheme by analyzing relevant documents in social networks. Appl Sci 11:1–18. https://doi.org/10.3390/app11020577

    Article  Google Scholar 

  13. Cybulska A, Vossen P (2010) “Event models for historical perspectives: determining relations between high and low level events in text, based on the classification of time, location and participants,” In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), pp. 3355–3362

  14. Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305

    MATH  Google Scholar 

  15. Fragos K, Belsis P, Skourlas C (2014) Combining probabilistic classifiers for text classification. Procedia Soc Behav Sci 147:307–312. https://doi.org/10.1016/j.sbspro.2014.07.098

    Article  Google Scholar 

  16. Frank E, Bouckaert RR (2006) Naive bayes for text classification with unbalanced classes. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinforma) 4213 LNAI:503–510. https://doi.org/10.1007/11871637_49

    Article  Google Scholar 

  17. Friburger N, Maurel D (2002) “Textual similarity based on proper names,” In: Proceedings of the workshop Mathematical/Formal Methods in Information Retrieval, pp. 155–167

  18. “GeoNames” (2021) http://www.geonames.org/. Accessed 10 Jan 2021

  19. Erkan G, Hassan A, Diao Q, Radev DR (2008) “Improved Nearest Neighbor Methods For Text Classification With Language Modeling and Harmonic Functions.” https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.139.2415&rep=rep1&type=pdf

  20. Hui JLO, Hoon GK, Zainon WMNW (2017) Effects of word class and text position in sentiment-based news classification. Procedia Comput Sci 124:77–85. https://doi.org/10.1016/j.procs.2017.12.132

    Article  Google Scholar 

  21. Jin P, Mu L, Zheng L, Zhao J, Yue L (2017) “News feature extraction for events on social network platforms,” In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 69–78. https://doi.org/10.1145/3041021.3054151

  22. Kang DK, Zhang J, Silvescu A, Honavar V (2005) “Multinomial Event Model Based Abstraction for Sequence and Text Classification,” In: Zucker JD., Saitta L. (eds), Abstraction, Reformulation and Approximation. SARA 2005. Lecture notes in computer science, vol. 3607, pp. 134–148. https://doi.org/10.1007/11527862_10

  23. Kumaran G, Allan J (2004) “Text classification and named entities for new event detection,” In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 297–304. https://doi.org/10.1145/1008992.1009044

  24. Lazaridou K, Gruetze T, Naumann F (2018) “Where in the world is carmen sandiego? Detecting person locations via social media discussions,” Proc. 10th ACM Conf. Web Sci. WebSci, pp. 229–238, 2018. https://doi.org/10.1145/3201064.3201068

  25. Le Q, Mikolov T (2014) “Distributed representations of sentences and documents,” In: Proceedings of the 31st International Conference on Machine Learning, ICML 2014, vol. 32, pp. 1188–1196

  26. Li Q, Zhang Q (2020) “A unified model for financial event classification, detection and summarization,” In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), pp. 4668–4674. https://doi.org/10.24963/ijcai.2020/644

  27. Li H, Li Z, Lee WC, Lee DL (2009) “A probabilistic topic-based ranking framework for location-sensitive domain information retrieval,” In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR ‘09), pp. 331–338. https://doi.org/10.1145/1571941.1571999

  28. Lieberman MD, Samet H, Sankaranarayanan J, Sperling J (2019) “STEWARD: Architecture of a Spatio-Textual Search Engine,” In: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems (GIS ‘07), pp. 1–8, Article No. 25. https://doi.org/10.1145/1341012.1341045

  29. Liu B, Li X, Lee WS, Yu PS (2004) “Text classification by labeling words,” In: Proceedings of the 19th national conference on Artificial intelligence, pp. 425–430

  30. Lu Y, Zhai Y, Luo J, Chen Y (2019) MLPV: Text Representation of Scientific Papers Based on Structural Information and Doc2vec. Am J Inf Sci Technol 3(3):62. https://doi.org/10.11648/j.ajist.20190303.12

    Article  Google Scholar 

  31. Manning CD, Raghavan P, Schutze H, Manning CD, Raghavan P, Schutze H (2009) “Text classification and Naive Bayes,” In: Introduction to Information Retrieval, pp. 253–287

  32. Martins B, Silva MJ (2005) “A graph-ranking algorithm for geo-referencing documents,” In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 1–4. https://doi.org/10.1109/ICDM.2005.6

  33. Medvet E, Bartoli A (2012) “Brand-related events detection, classification and summarization on twitter,” In: Proceedings - 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 297–302. https://doi.org/10.1109/WI-IAT.2012.36

  34. Mikolov T, Chen K, Corrado G, Dean J (2013) “Efficient estimation of word representations in vector space,” In: 1st International Conference on Learning Representations, ICLR 2013, pp. 1–12

  35. Montalvo S, Martínez R, Casillas A, Fresno V (2007) “Bilingual news clustering using named entities and fuzzy similarity,” In: Matoušek V, Mautner P (eds), Text, Speech and Dialogue. TSD 2007. Lecture notes in computer science, vol. 4629, pp. 107–114. https://doi.org/10.1007/978-3-540-74628-7_16

  36. “Newpaper3k 0.2.8” (2021) Retrieved from https://pypi.org/project/newspaper3k. Accessed 24 Jan 2021

  37. Noble J, Gamit H (2020) "Unsupervised Contextual Clustering of Abstracts," In: Proceedings- SAS Global Forum 2020, 16 June 2020

  38. Odon De Alencar R, Davis CA, Gonçalves MA (2010) “Geographical classification of documents using evidence from Wikipedia,” In: Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR’10, pp. 1–8. https://doi.org/10.1145/1722080.1722096

  39. Sheela J, Vadivel A (2016) Criminal Event Detection and Classification in Web Documents Using ANN Classifier. Int J Signal Process Syst 4(5):382–388. https://doi.org/10.18178/ijsps.4.5.382-388

    Article  Google Scholar 

  40. Silva MJ, Martins B, Chaves M, Afonso AP, Cardoso N (2006) Adding geographic scopes to web resources. Comput Environ Urban Syst 30(4):378–399. https://doi.org/10.1016/j.compenvurbsys.2005.08.003

    Article  Google Scholar 

  41. Smith DA (2002) “Detecting events with date and place information in unstructured text,” In: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, pp. 191–196. https://doi.org/10.1145/544220.544260

  42. Stankevičius L, Lukoševičius M (2019) Lithuanian news clustering using document embeddings. CEUR Workshop Proc 2470:104–109

    Google Scholar 

  43. Uteuov A, Kalyuzhnaya A (2018) Combined document embedding and hierarchical topic model for social media texts analysis. Procedia Comput Sci 136:293–303. https://doi.org/10.1016/j.procs.2018.08.285

    Article  Google Scholar 

  44. Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92. https://doi.org/10.1016/j.eswa.2015.08.050

    Article  Google Scholar 

  45. Valentin S, Lancelot R, Roche M (2018) How to combine spatio-temporal and thematic features in online news for enhanced animal disease surveillance? Procedia Comput Sci 126:490–497. https://doi.org/10.1016/j.procS.2018.07.283

    Article  Google Scholar 

  46. “Wikipedia: Building Collapses in India” (2021) https://en.wikipedia.org/wiki/Category:Building_collapses_in_India. Accessed 24 Jan 2021

  47. “Wikipedia: Natural Disasters in India” (2021) https://en.wikipedia.org/wiki/Category:Natural_disasters_in_India. Accessed 24 Jan 2021

  48. Woodruff AG, Plaunt C (1994) GIPSY: automated geographic indexing of text documents. J Am Soc Inf Sci 45(9):645–655. https://doi.org/10.1002/(SICI)1097-4571(199410)45:9<645::AID-ASI2>3.0.CO;2-8

  49. Wróbel K, Wielgosz M, Pietron M, Karwatowski M, Duda J, Smywinski-Pohl A (2018) “Improving text classification with vectors of reduced precision∗,” In: ICAART 2018 - Proceedings of the 10th International Conference on Agents and Artificial Intelligence, vol. 2, no. Icaart, pp. 531–538. https://doi.org/10.5220/0006641505310538

  50. Wu Q, Ye Y, Zhang H, Ng MK, Ho SS (2014) ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowledge-Based Syst 67:105–116. https://doi.org/10.1016/j.knosys.2014.06.004

    Article  Google Scholar 

  51. Zhang T, Oles FJ (2001) Text categorization based on regularized linear. Inf Retr Boston 4(1994):5–31

    Article  MATH  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Monika Rani.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Rani, M., Kaushal, S. A novel framework for multiclass supervised classification of location-sensitive events. Multimed Tools Appl 82, 9667–9692 (2023). https://doi.org/10.1007/s11042-021-11842-8

Download citation

  • Received:

  • Revised:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s11042-021-11842-8

Keywords

Navigation