A novel framework for multiclass supervised classification of location-sensitive events

Rani, Monika; Kaushal, Sakshi

doi:10.1007/s11042-021-11842-8

A novel framework for multiclass supervised classification of location-sensitive events

1222: Intelligent Multimedia Data Analytics and Computing
Published: 16 February 2022

Volume 82, pages 9667–9692, (2023)
Cite this article

Multimedia Tools and Applications Aims and scope Submit manuscript

202 Accesses
1 Altmetric
Explore all metrics

Abstract

In the past couple of years, location-sensitive information retrieval has gained significant attention in terms of extracting and utilizing location information present in the unstructured text. It requires analysis of documents both geographically and thematically that makes it a challenging task. The semantics of text needs to be associated with location features present in the text. Such information association is beneficial in conducting fine-grained analysis of events reported in the text, e.g., Tourist location recommendation, Disaster surveillance, Political activeness and Happiness index, etc. Recently, context-based vector space models have attained much importance in text mining as they intelligently preserve semantics of the text while representing text in vector space of desired dimension. In this paper, a framework for multiclass supervised classification of location-sensitive events, namely, LDoc2Vec is proposed that integrates context-based vector space models with geographic scope resolution of events reported in the text documents. Variants of the Doc2Vec model have been integrated with location features and their performance for multiclass supervised event classification is analysed. Experimental results with various machine learning classifiers indicate that the proposed framework outperforms baseline Doc2Vec models for multiclass classification of location-sensitive events as expressed by renowned performance measurement metrics viz. precision, recall and F1-score.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Combinations of Content Representation Models for Event Detection on Social Media

Improving Events Classification with Latent Space Clustering-Based Similarities

Automatic Classification for Representative Spatio-temporal-Based Event Document Using Machine Learning

References

Ali D, Missen MMS, Husnain M (2021) Multiclass event classification from text. Sci Program 2021:1–15. https://doi.org/10.1155/2021/6660651
Article Google Scholar
Al-Rfou R, Kulkarni V, Perozzi B, Skiena S (2015) “POLYGLOT-NER: Massive Multilingual named entity recognition,” arXiv:1410.3791, [Online]. Available: https://arxiv.org/abs/1410.3791. Accessed 6 Feb 2021
Amitay E, Har’El N, Sivan R, Soffer A (2004) “Web-a-where: geotagging web content,” In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 273–280. https://doi.org/10.1145/1008992.1009040
Anastácio I, Martins B, Calado P (2009) “Classifying documents according to locational relevance,” In: Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, pp. 598–609. https://doi.org/10.1007/978-3-642-04686-5_49
Andogah G, Bouma G, Nerbonne J (2012) Every document has a geographical scope. Data Knowl Eng 81–82:1–20. https://doi.org/10.1016/j.datak.2012.07.002
Article Google Scholar
Bendimerad A, Plantevit M, Robardet C, Amer-Yahia S (2021) User-driven Geolocated event detection in social media. IEEE Trans Knowl Data Eng 33(2):796–809. https://doi.org/10.1109/TKDE.2019.2931340
Article Google Scholar
Bijalwan V, Kumar V, Kumari P, Pascual J (2014) KNN based machine learning approach for text and document mining. Int J Database Theory Appl 7(1):61–70. https://doi.org/10.14257/ijdta.2014.7.1.06
Article Google Scholar
Bilgin M, Köktaş H (2019) Sentiment analysis with term weighting and word vectors. Int Arab J Inf Technol 16(5):953–959
Google Scholar
Cao TH, Tang TM, Chau CK (2012) “Text Clustering with Named Entities: A Model, Experimentation and Realization,” In: Holmes DE, Jain LC (eds), Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol. 23, pp. 267–287. https://doi.org/10.1007/978-3-642-23166-7_10
“Census of India, List of Towns” (2020) https://censusindia.gov.in/Tables_Published/Admin_Units/admin.html. Accessed 27 Dec 2020
Cha M, Gwon Y, Kung HT (2017) “Language modeling by clustering with word embeddings for text readability assessment,” In: International Conference on Information and Knowledge Management, Proceedings, vol. Part F1318, pp. 2003–2006. https://doi.org/10.1145/3132847.3133104
Choi D, Park S, Ham D, Lim H, Bok K, Yoo J (2021) Local event detection scheme by analyzing relevant documents in social networks. Appl Sci 11:1–18. https://doi.org/10.3390/app11020577
Article Google Scholar
Cybulska A, Vossen P (2010) “Event models for historical perspectives: determining relations between high and low level events in text, based on the classification of time, location and participants,” In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), pp. 3355–3362
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
MATH Google Scholar
Fragos K, Belsis P, Skourlas C (2014) Combining probabilistic classifiers for text classification. Procedia Soc Behav Sci 147:307–312. https://doi.org/10.1016/j.sbspro.2014.07.098
Article Google Scholar
Frank E, Bouckaert RR (2006) Naive bayes for text classification with unbalanced classes. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinforma) 4213 LNAI:503–510. https://doi.org/10.1007/11871637_49
Article Google Scholar
Friburger N, Maurel D (2002) “Textual similarity based on proper names,” In: Proceedings of the workshop Mathematical/Formal Methods in Information Retrieval, pp. 155–167
“GeoNames” (2021) http://www.geonames.org/. Accessed 10 Jan 2021
Erkan G, Hassan A, Diao Q, Radev DR (2008) “Improved Nearest Neighbor Methods For Text Classification With Language Modeling and Harmonic Functions.” https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.139.2415&rep=rep1&type=pdf
Hui JLO, Hoon GK, Zainon WMNW (2017) Effects of word class and text position in sentiment-based news classification. Procedia Comput Sci 124:77–85. https://doi.org/10.1016/j.procs.2017.12.132
Article Google Scholar
Jin P, Mu L, Zheng L, Zhao J, Yue L (2017) “News feature extraction for events on social network platforms,” In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 69–78. https://doi.org/10.1145/3041021.3054151
Kang DK, Zhang J, Silvescu A, Honavar V (2005) “Multinomial Event Model Based Abstraction for Sequence and Text Classification,” In: Zucker JD., Saitta L. (eds), Abstraction, Reformulation and Approximation. SARA 2005. Lecture notes in computer science, vol. 3607, pp. 134–148. https://doi.org/10.1007/11527862_10
Kumaran G, Allan J (2004) “Text classification and named entities for new event detection,” In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 297–304. https://doi.org/10.1145/1008992.1009044
Lazaridou K, Gruetze T, Naumann F (2018) “Where in the world is carmen sandiego? Detecting person locations via social media discussions,” Proc. 10th ACM Conf. Web Sci. WebSci, pp. 229–238, 2018. https://doi.org/10.1145/3201064.3201068
Le Q, Mikolov T (2014) “Distributed representations of sentences and documents,” In: Proceedings of the 31st International Conference on Machine Learning, ICML 2014, vol. 32, pp. 1188–1196
Li Q, Zhang Q (2020) “A unified model for financial event classification, detection and summarization,” In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), pp. 4668–4674. https://doi.org/10.24963/ijcai.2020/644
Li H, Li Z, Lee WC, Lee DL (2009) “A probabilistic topic-based ranking framework for location-sensitive domain information retrieval,” In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR ‘09), pp. 331–338. https://doi.org/10.1145/1571941.1571999
Lieberman MD, Samet H, Sankaranarayanan J, Sperling J (2019) “STEWARD: Architecture of a Spatio-Textual Search Engine,” In: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems (GIS ‘07), pp. 1–8, Article No. 25. https://doi.org/10.1145/1341012.1341045
Liu B, Li X, Lee WS, Yu PS (2004) “Text classification by labeling words,” In: Proceedings of the 19th national conference on Artificial intelligence, pp. 425–430
Lu Y, Zhai Y, Luo J, Chen Y (2019) MLPV: Text Representation of Scientific Papers Based on Structural Information and Doc2vec. Am J Inf Sci Technol 3(3):62. https://doi.org/10.11648/j.ajist.20190303.12
Article Google Scholar
Manning CD, Raghavan P, Schutze H, Manning CD, Raghavan P, Schutze H (2009) “Text classification and Naive Bayes,” In: Introduction to Information Retrieval, pp. 253–287
Martins B, Silva MJ (2005) “A graph-ranking algorithm for geo-referencing documents,” In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 1–4. https://doi.org/10.1109/ICDM.2005.6
Medvet E, Bartoli A (2012) “Brand-related events detection, classification and summarization on twitter,” In: Proceedings - 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 297–302. https://doi.org/10.1109/WI-IAT.2012.36
Mikolov T, Chen K, Corrado G, Dean J (2013) “Efficient estimation of word representations in vector space,” In: 1st International Conference on Learning Representations, ICLR 2013, pp. 1–12
Montalvo S, Martínez R, Casillas A, Fresno V (2007) “Bilingual news clustering using named entities and fuzzy similarity,” In: Matoušek V, Mautner P (eds), Text, Speech and Dialogue. TSD 2007. Lecture notes in computer science, vol. 4629, pp. 107–114. https://doi.org/10.1007/978-3-540-74628-7_16
“Newpaper3k 0.2.8” (2021) Retrieved from https://pypi.org/project/newspaper3k. Accessed 24 Jan 2021
Noble J, Gamit H (2020) "Unsupervised Contextual Clustering of Abstracts," In: Proceedings- SAS Global Forum 2020, 16 June 2020
Odon De Alencar R, Davis CA, Gonçalves MA (2010) “Geographical classification of documents using evidence from Wikipedia,” In: Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR’10, pp. 1–8. https://doi.org/10.1145/1722080.1722096
Sheela J, Vadivel A (2016) Criminal Event Detection and Classification in Web Documents Using ANN Classifier. Int J Signal Process Syst 4(5):382–388. https://doi.org/10.18178/ijsps.4.5.382-388
Article Google Scholar
Silva MJ, Martins B, Chaves M, Afonso AP, Cardoso N (2006) Adding geographic scopes to web resources. Comput Environ Urban Syst 30(4):378–399. https://doi.org/10.1016/j.compenvurbsys.2005.08.003
Article Google Scholar
Smith DA (2002) “Detecting events with date and place information in unstructured text,” In: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, pp. 191–196. https://doi.org/10.1145/544220.544260
Stankevičius L, Lukoševičius M (2019) Lithuanian news clustering using document embeddings. CEUR Workshop Proc 2470:104–109
Google Scholar
Uteuov A, Kalyuzhnaya A (2018) Combined document embedding and hierarchical topic model for social media texts analysis. Procedia Comput Sci 136:293–303. https://doi.org/10.1016/j.procs.2018.08.285
Article Google Scholar
Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92. https://doi.org/10.1016/j.eswa.2015.08.050
Article Google Scholar
Valentin S, Lancelot R, Roche M (2018) How to combine spatio-temporal and thematic features in online news for enhanced animal disease surveillance? Procedia Comput Sci 126:490–497. https://doi.org/10.1016/j.procS.2018.07.283
Article Google Scholar
“Wikipedia: Building Collapses in India” (2021) https://en.wikipedia.org/wiki/Category:Building_collapses_in_India. Accessed 24 Jan 2021
“Wikipedia: Natural Disasters in India” (2021) https://en.wikipedia.org/wiki/Category:Natural_disasters_in_India. Accessed 24 Jan 2021
Woodruff AG, Plaunt C (1994) GIPSY: automated geographic indexing of text documents. J Am Soc Inf Sci 45(9):645–655. https://doi.org/10.1002/(SICI)1097-4571(199410)45:9<645::AID-ASI2>3.0.CO;2-8
Wróbel K, Wielgosz M, Pietron M, Karwatowski M, Duda J, Smywinski-Pohl A (2018) “Improving text classification with vectors of reduced precision∗,” In: ICAART 2018 - Proceedings of the 10th International Conference on Agents and Artificial Intelligence, vol. 2, no. Icaart, pp. 531–538. https://doi.org/10.5220/0006641505310538
Wu Q, Ye Y, Zhang H, Ng MK, Ho SS (2014) ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowledge-Based Syst 67:105–116. https://doi.org/10.1016/j.knosys.2014.06.004
Article Google Scholar
Zhang T, Oles FJ (2001) Text categorization based on regularized linear. Inf Retr Boston 4(1994):5–31
Article MATH Google Scholar

Download references

Author information

Authors and Affiliations

University Institute of Engineering & Technology, Panjab University, Chandigarh, 160014, India
Monika Rani & Sakshi Kaushal

Authors

Monika Rani
View author publications
You can also search for this author in PubMed Google Scholar
Sakshi Kaushal
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Monika Rani.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Rani, M., Kaushal, S. A novel framework for multiclass supervised classification of location-sensitive events. Multimed Tools Appl 82, 9667–9692 (2023). https://doi.org/10.1007/s11042-021-11842-8

Download citation

Received: 09 March 2021
Revised: 19 August 2021
Accepted: 23 December 2021
Published: 16 February 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s11042-021-11842-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A novel framework for multiclass supervised classification of location-sensitive events

Abstract

Access this article

Similar content being viewed by others

Combinations of Content Representation Models for Event Detection on Social Media

Improving Events Classification with Latent Space Clustering-Based Similarities

Automatic Classification for Representative Spatio-temporal-Based Event Document Using Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

A novel framework for multiclass supervised classification of location-sensitive events

Abstract

Access this article

Similar content being viewed by others

Combinations of Content Representation Models for Event Detection on Social Media

Improving Events Classification with Latent Space Clustering-Based Similarities

Automatic Classification for Representative Spatio-temporal-Based Event Document Using Machine Learning

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation