An effective short text conceptualization based on new short text similarity

Bekkali, Mohammed; Lachkar, Abdelmonaime

doi:10.1007/s13278-018-0544-8

An effective short text conceptualization based on new short text similarity

Original Article
Published: 03 December 2018

Volume 9, article number 1, (2019)
Cite this article

Social Network Analysis and Mining Aims and scope Submit manuscript

682 Accesses
16 Citations
1 Altmetric
Explore all metrics

Abstract

Recently short text messages, tweets, comments and so on, have become a large portion of the online text data. They are limited in length and different from traditional documents in their shortness and sparseness. As a result, short text tends to be ambiguous and its degree is not the same for all languages; and as Arabic is a very high flexional language, where a single word can have multiple meanings, the short text representation plays a vital role in any Text Mining task. To address these issues, we propose an efficient representation for short text based on concepts instead of terms using BabelNet as an external knowledge. However, in the conceptualization process, while searching polysemic term-corresponding concepts, multiple matches are detected. Therefore, assigning a term to a concept is a crucial step and we believe that short text similarity can be useful to overcome the problem of mapping term to the corresponding concept. In this paper, we reintroduce Web-based Kernel function for measuring the semantic relatedness between concepts to disambiguate an expression versus multiple concepts. The proposed method has been evaluated using an Arabic short text categorization system and the obtained results illustrate the interest of our contribution.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Alahmadi A, Joorabchi A, Mahdi AE (2014) Arabic text classification using bag-of-concepts representation. In: Proceedings of the international conference on knowledge discovery and information retrieval (KDIR), pp 374–380
Albitar S, Fournier S, Espinasse B (2012) The impact of conceptualization on text classification. In: WISE 2012, LNCS 7651, pp. 326–339
Aly M, Atiya A (2013) LABR: large-scale Arabic book reviews dataset. In: Proceedings of the 51st annual meeting of the association for computational linguistics, Sofia, Bulgaria, pp 494–498
Banerjee S, Pedersen T (2003) Extended gloss overlaps as a measure of semantic relatedness. In: Proceedings of the 18th international joint conference on artificial intelligence, pp 805–810
Bekkali M, Lachkar A (2017) Web search engine-based representation for Arabic tweets categorization. In: Kaya M, Erdoǧan Ö, Rokne J (eds) From social data mining and analysis to prediction and community detection. Lecture notes in social networks, Springer, New York, pp 79–101. ISBN: 978-3-319-51367-6
Bekkali M, Lachkar. SahmoudiI A (2015) Enriching Arabic tweets representation based on web search engine and the rough set theory. In: Proceedings of the 2015 IEEE/ACM international conference on advances in social networks analysis and mining, pp 1573–1574
Blei DM, Ng A, Jordan. M (2003) Latent Dirichlet allocation. J Mach Learn Res 3:993–1022
MATH Google Scholar
Duan L, Xu T (2016) A short text similarity algorithm for finding similar police 110 incidents. In: Proceedings of the 7th international conference on cloud computing and big data, Macau, China, pp 260–264
Gabrilovich E, Markovitch S (2006) Overcoming the brittleness bottleneck using wikipedia: enhancing text categorization with encyclopedic knowledge. In: 21st National conference on artificial intelligence, vol 2, pp 1301–1306
Guo W, Diab M (2012) Learning the latent semantics of a concept by its definition. In: Proceedings of the 50th annual meeting of the association for computational linguistics, pp 140–144
Hu X, Zhang X, Lu C, Park EK, Zhou X (2009a) Exploiting Wikipedia as external knowledge for document clustering. In: Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining, Paris, pp 389–396
Hu X, Sun N, Zhang C, Chua T-S (2009b) Exploiting internal and external semantics for the clustering of short texts using world knowledge. In: Proceedings of 18th ACM conference on information and knowledge management, pp 919–928
Jiang JJ, Conrath DW (1997) Semantic similarity based on corpus statistics and lexical taxonomy. In: International conference research on computational linguistics
Kenter T, de Rijke M (2015) Short text similarity with word embeddings. In CIKM, pp 1411–1420
Khoja S, Garside R (1999) Stemming Arabic text. Computer Science Department, Lancaster University, Lancaster
Google Scholar
Komorowski J, Polkowski L, Andrzej S (1998) Rough sets: a tutorial
Landauer TK, Foltz PW, Laham D (1998) Introduction to latent semantic analysis. Discourse Process 25:259–284
Article Google Scholar
Larkey L, Ballesteros L, Connell ME (2002) Improving stemming for Arabic information retrieval: light stemming and co-occurrence analysis. In: Proceedings of SIGIR’02, pp 275–282
Li J, Cai Y, Cai Z, Leung H, Yang K (2017) Wikipedia based short text classification method. DASFAA 2017 Workshops, LNCS 10179, pp 275–286
Lund K, Burgess C, Atchley RA (1995) Semantic and associative priming in a high-dimensional semantic space. In: Cognitive SCIENCE PROCEEDINgs (LEA), pp 660–665
Nagoudi EMB, Schwab D (2016) Semantic similarity of arabic sentences with word embeddings. In: Proceedings of the third arabic natural language processing workshop (WANLP), Valencia, pp 18–24
Navigli R, Ponzetto S (2012) BabelNet: the automatic construction, evaluation and application of a wide-coverage multilingual semantic network. Artificial Intelligence, 193, Elsevier, pp217–250
Ngo CL (2003) A tolerance rough set approach to clustering web search results. Warsaw University, Poland
Google Scholar
Pawlak Z (1991) Rough sets: theoretical aspects of reasoning about data. Kluwer, Dordrecht
Book Google Scholar
Phan X-H, Nguyen L-M, Horiguchi S (2008) Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of 17th international conference on World Wide Web, pp 91–100
Sahami M, Heilman T (2006) A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of international World Wide Web, Edinburgh, Scotland, pp 377–386
Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1):1–47
Article Google Scholar
Tang J, Wang X, Gao H, Hu X, Liu H (2012) Enriching short text representation in microblog for clustering. Front Comput Sci Chin 6(1):88–101
MathSciNet MATH Google Scholar
Wang X, Chen R, Jia Y, Zhou B (2013) Short text classification using Wikipedia concept based document representation. In: The international conference on information technology and applications, pp 471–474
Yih W-T, Meek C (2007) Improving similarity measures for short segments of text. In: Proceeding AAAI’07 proceedings of the 22nd national conference on artificial intelligence, V2, pp 1489–1494
Yousif SA, Samawi VW, Elkabani I, Member IAENG (2017) Arabic text classification: the effect of the AWN relations weighting scheme. In: Proceedings of the world congress on engineering, London
Zhang J, Chen S (2013) A study on clustering algorithm of Web search results based on rough set. In: Software engineering and service science (ICSESS), pp 292–295
Zhixing L, Zhongyang X, Yufang Z, Chunyong L, Kuan L (2011) Fast text categorization using concise semantic analysis. Pattern Recogn Lett 32:441–448
Article Google Scholar

Download references

Author information

Authors and Affiliations

LISA Laboratory, ENSA, USMBA, Fez, Morocco
Mohammed Bekkali
ENSA, AEU, Tangier, Morocco
Abdelmonaime Lachkar

Authors

Mohammed Bekkali
View author publications
You can also search for this author in PubMed Google Scholar
Abdelmonaime Lachkar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohammed Bekkali.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bekkali, M., Lachkar, A. An effective short text conceptualization based on new short text similarity. Soc. Netw. Anal. Min. 9, 1 (2019). https://doi.org/10.1007/s13278-018-0544-8

Download citation

Received: 12 June 2018
Revised: 07 November 2018
Accepted: 14 November 2018
Published: 03 December 2018
DOI: https://doi.org/10.1007/s13278-018-0544-8

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

An effective short text conceptualization based on new short text similarity

Abstract

Access this article

Similar content being viewed by others

Semantic similarity based approach for reducing Arabic texts dimensionality

Multi-Word Expressions Annotations Effect in Document Classification Task

Short Text Representation Model Construction Method Based on Novel Semantic Aggregation Technology

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

An effective short text conceptualization based on new short text similarity

Abstract

Access this article

Similar content being viewed by others

Semantic similarity based approach for reducing Arabic texts dimensionality

Multi-Word Expressions Annotations Effect in Document Classification Task

Short Text Representation Model Construction Method Based on Novel Semantic Aggregation Technology

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation