Abstract
In the past couple of years, location-sensitive information retrieval has gained significant attention in terms of extracting and utilizing location information present in the unstructured text. It requires analysis of documents both geographically and thematically that makes it a challenging task. The semantics of text needs to be associated with location features present in the text. Such information association is beneficial in conducting fine-grained analysis of events reported in the text, e.g., Tourist location recommendation, Disaster surveillance, Political activeness and Happiness index, etc. Recently, context-based vector space models have attained much importance in text mining as they intelligently preserve semantics of the text while representing text in vector space of desired dimension. In this paper, a framework for multiclass supervised classification of location-sensitive events, namely, LDoc2Vec is proposed that integrates context-based vector space models with geographic scope resolution of events reported in the text documents. Variants of the Doc2Vec model have been integrated with location features and their performance for multiclass supervised event classification is analysed. Experimental results with various machine learning classifiers indicate that the proposed framework outperforms baseline Doc2Vec models for multiclass classification of location-sensitive events as expressed by renowned performance measurement metrics viz. precision, recall and F1-score.
Similar content being viewed by others
References
Ali D, Missen MMS, Husnain M (2021) Multiclass event classification from text. Sci Program 2021:1–15. https://doi.org/10.1155/2021/6660651
Al-Rfou R, Kulkarni V, Perozzi B, Skiena S (2015) “POLYGLOT-NER: Massive Multilingual named entity recognition,” arXiv:1410.3791, [Online]. Available: https://arxiv.org/abs/1410.3791. Accessed 6 Feb 2021
Amitay E, Har’El N, Sivan R, Soffer A (2004) “Web-a-where: geotagging web content,” In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 273–280. https://doi.org/10.1145/1008992.1009040
Anastácio I, Martins B, Calado P (2009) “Classifying documents according to locational relevance,” In: Proceedings of the 14th Portuguese Conference on Artificial Intelligence: Progress in Artificial Intelligence, pp. 598–609. https://doi.org/10.1007/978-3-642-04686-5_49
Andogah G, Bouma G, Nerbonne J (2012) Every document has a geographical scope. Data Knowl Eng 81–82:1–20. https://doi.org/10.1016/j.datak.2012.07.002
Bendimerad A, Plantevit M, Robardet C, Amer-Yahia S (2021) User-driven Geolocated event detection in social media. IEEE Trans Knowl Data Eng 33(2):796–809. https://doi.org/10.1109/TKDE.2019.2931340
Bijalwan V, Kumar V, Kumari P, Pascual J (2014) KNN based machine learning approach for text and document mining. Int J Database Theory Appl 7(1):61–70. https://doi.org/10.14257/ijdta.2014.7.1.06
Bilgin M, Köktaş H (2019) Sentiment analysis with term weighting and word vectors. Int Arab J Inf Technol 16(5):953–959
Cao TH, Tang TM, Chau CK (2012) “Text Clustering with Named Entities: A Model, Experimentation and Realization,” In: Holmes DE, Jain LC (eds), Data Mining: Foundations and Intelligent Paradigms. Intelligent Systems Reference Library, vol. 23, pp. 267–287. https://doi.org/10.1007/978-3-642-23166-7_10
“Census of India, List of Towns” (2020) https://censusindia.gov.in/Tables_Published/Admin_Units/admin.html. Accessed 27 Dec 2020
Cha M, Gwon Y, Kung HT (2017) “Language modeling by clustering with word embeddings for text readability assessment,” In: International Conference on Information and Knowledge Management, Proceedings, vol. Part F1318, pp. 2003–2006. https://doi.org/10.1145/3132847.3133104
Choi D, Park S, Ham D, Lim H, Bok K, Yoo J (2021) Local event detection scheme by analyzing relevant documents in social networks. Appl Sci 11:1–18. https://doi.org/10.3390/app11020577
Cybulska A, Vossen P (2010) “Event models for historical perspectives: determining relations between high and low level events in text, based on the classification of time, location and participants,” In: Proceedings of the Seventh International Conference on Language Resources and Evaluation (LREC’10), pp. 3355–3362
Forman G (2003) An extensive empirical study of feature selection metrics for text classification. J Mach Learn Res 3:1289–1305
Fragos K, Belsis P, Skourlas C (2014) Combining probabilistic classifiers for text classification. Procedia Soc Behav Sci 147:307–312. https://doi.org/10.1016/j.sbspro.2014.07.098
Frank E, Bouckaert RR (2006) Naive bayes for text classification with unbalanced classes. Lect Notes Comput Sci (including Subser Lect Notes Artif Intell Lect Notes Bioinforma) 4213 LNAI:503–510. https://doi.org/10.1007/11871637_49
Friburger N, Maurel D (2002) “Textual similarity based on proper names,” In: Proceedings of the workshop Mathematical/Formal Methods in Information Retrieval, pp. 155–167
“GeoNames” (2021) http://www.geonames.org/. Accessed 10 Jan 2021
Erkan G, Hassan A, Diao Q, Radev DR (2008) “Improved Nearest Neighbor Methods For Text Classification With Language Modeling and Harmonic Functions.” https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.139.2415&rep=rep1&type=pdf
Hui JLO, Hoon GK, Zainon WMNW (2017) Effects of word class and text position in sentiment-based news classification. Procedia Comput Sci 124:77–85. https://doi.org/10.1016/j.procs.2017.12.132
Jin P, Mu L, Zheng L, Zhao J, Yue L (2017) “News feature extraction for events on social network platforms,” In: Proceedings of the 26th International Conference on World Wide Web Companion, pp. 69–78. https://doi.org/10.1145/3041021.3054151
Kang DK, Zhang J, Silvescu A, Honavar V (2005) “Multinomial Event Model Based Abstraction for Sequence and Text Classification,” In: Zucker JD., Saitta L. (eds), Abstraction, Reformulation and Approximation. SARA 2005. Lecture notes in computer science, vol. 3607, pp. 134–148. https://doi.org/10.1007/11527862_10
Kumaran G, Allan J (2004) “Text classification and named entities for new event detection,” In: Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval, pp. 297–304. https://doi.org/10.1145/1008992.1009044
Lazaridou K, Gruetze T, Naumann F (2018) “Where in the world is carmen sandiego? Detecting person locations via social media discussions,” Proc. 10th ACM Conf. Web Sci. WebSci, pp. 229–238, 2018. https://doi.org/10.1145/3201064.3201068
Le Q, Mikolov T (2014) “Distributed representations of sentences and documents,” In: Proceedings of the 31st International Conference on Machine Learning, ICML 2014, vol. 32, pp. 1188–1196
Li Q, Zhang Q (2020) “A unified model for financial event classification, detection and summarization,” In: Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20), pp. 4668–4674. https://doi.org/10.24963/ijcai.2020/644
Li H, Li Z, Lee WC, Lee DL (2009) “A probabilistic topic-based ranking framework for location-sensitive domain information retrieval,” In: Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval (SIGIR ‘09), pp. 331–338. https://doi.org/10.1145/1571941.1571999
Lieberman MD, Samet H, Sankaranarayanan J, Sperling J (2019) “STEWARD: Architecture of a Spatio-Textual Search Engine,” In: Proceedings of the 15th annual ACM international symposium on Advances in geographic information systems (GIS ‘07), pp. 1–8, Article No. 25. https://doi.org/10.1145/1341012.1341045
Liu B, Li X, Lee WS, Yu PS (2004) “Text classification by labeling words,” In: Proceedings of the 19th national conference on Artificial intelligence, pp. 425–430
Lu Y, Zhai Y, Luo J, Chen Y (2019) MLPV: Text Representation of Scientific Papers Based on Structural Information and Doc2vec. Am J Inf Sci Technol 3(3):62. https://doi.org/10.11648/j.ajist.20190303.12
Manning CD, Raghavan P, Schutze H, Manning CD, Raghavan P, Schutze H (2009) “Text classification and Naive Bayes,” In: Introduction to Information Retrieval, pp. 253–287
Martins B, Silva MJ (2005) “A graph-ranking algorithm for geo-referencing documents,” In: Proceedings of the Fifth IEEE International Conference on Data Mining (ICDM’05), pp. 1–4. https://doi.org/10.1109/ICDM.2005.6
Medvet E, Bartoli A (2012) “Brand-related events detection, classification and summarization on twitter,” In: Proceedings - 2012 IEEE/WIC/ACM International Conferences on Web Intelligence and Intelligent Agent Technology, pp. 297–302. https://doi.org/10.1109/WI-IAT.2012.36
Mikolov T, Chen K, Corrado G, Dean J (2013) “Efficient estimation of word representations in vector space,” In: 1st International Conference on Learning Representations, ICLR 2013, pp. 1–12
Montalvo S, Martínez R, Casillas A, Fresno V (2007) “Bilingual news clustering using named entities and fuzzy similarity,” In: Matoušek V, Mautner P (eds), Text, Speech and Dialogue. TSD 2007. Lecture notes in computer science, vol. 4629, pp. 107–114. https://doi.org/10.1007/978-3-540-74628-7_16
“Newpaper3k 0.2.8” (2021) Retrieved from https://pypi.org/project/newspaper3k. Accessed 24 Jan 2021
Noble J, Gamit H (2020) "Unsupervised Contextual Clustering of Abstracts," In: Proceedings- SAS Global Forum 2020, 16 June 2020
Odon De Alencar R, Davis CA, Gonçalves MA (2010) “Geographical classification of documents using evidence from Wikipedia,” In: Proceedings of the 6th Workshop on Geographic Information Retrieval, GIR’10, pp. 1–8. https://doi.org/10.1145/1722080.1722096
Sheela J, Vadivel A (2016) Criminal Event Detection and Classification in Web Documents Using ANN Classifier. Int J Signal Process Syst 4(5):382–388. https://doi.org/10.18178/ijsps.4.5.382-388
Silva MJ, Martins B, Chaves M, Afonso AP, Cardoso N (2006) Adding geographic scopes to web resources. Comput Environ Urban Syst 30(4):378–399. https://doi.org/10.1016/j.compenvurbsys.2005.08.003
Smith DA (2002) “Detecting events with date and place information in unstructured text,” In: Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries, pp. 191–196. https://doi.org/10.1145/544220.544260
Stankevičius L, Lukoševičius M (2019) Lithuanian news clustering using document embeddings. CEUR Workshop Proc 2470:104–109
Uteuov A, Kalyuzhnaya A (2018) Combined document embedding and hierarchical topic model for social media texts analysis. Procedia Comput Sci 136:293–303. https://doi.org/10.1016/j.procs.2018.08.285
Uysal AK (2016) An improved global feature selection scheme for text classification. Expert Syst Appl 43:82–92. https://doi.org/10.1016/j.eswa.2015.08.050
Valentin S, Lancelot R, Roche M (2018) How to combine spatio-temporal and thematic features in online news for enhanced animal disease surveillance? Procedia Comput Sci 126:490–497. https://doi.org/10.1016/j.procS.2018.07.283
“Wikipedia: Building Collapses in India” (2021) https://en.wikipedia.org/wiki/Category:Building_collapses_in_India. Accessed 24 Jan 2021
“Wikipedia: Natural Disasters in India” (2021) https://en.wikipedia.org/wiki/Category:Natural_disasters_in_India. Accessed 24 Jan 2021
Woodruff AG, Plaunt C (1994) GIPSY: automated geographic indexing of text documents. J Am Soc Inf Sci 45(9):645–655. https://doi.org/10.1002/(SICI)1097-4571(199410)45:9<645::AID-ASI2>3.0.CO;2-8
Wróbel K, Wielgosz M, Pietron M, Karwatowski M, Duda J, Smywinski-Pohl A (2018) “Improving text classification with vectors of reduced precision∗,” In: ICAART 2018 - Proceedings of the 10th International Conference on Agents and Artificial Intelligence, vol. 2, no. Icaart, pp. 531–538. https://doi.org/10.5220/0006641505310538
Wu Q, Ye Y, Zhang H, Ng MK, Ho SS (2014) ForesTexter: an efficient random forest algorithm for imbalanced text categorization. Knowledge-Based Syst 67:105–116. https://doi.org/10.1016/j.knosys.2014.06.004
Zhang T, Oles FJ (2001) Text categorization based on regularized linear. Inf Retr Boston 4(1994):5–31
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher’s note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Rani, M., Kaushal, S. A novel framework for multiclass supervised classification of location-sensitive events. Multimed Tools Appl 82, 9667–9692 (2023). https://doi.org/10.1007/s11042-021-11842-8
Received:
Revised:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11042-021-11842-8