Efficiently mining frequent itemsets applied for textual aggregation

Bouakkaz, Mustapha; Ouinten, Youcef; Loudcher, Sabine; Fournier-Viger, Philippe

doi:10.1007/s10489-017-1050-9

Efficiently mining frequent itemsets applied for textual aggregation

Published: 11 August 2017

Volume 48, pages 1013–1019, (2018)
Cite this article

Applied Intelligence Aims and scope Submit manuscript

Mustapha Bouakkaz¹,
Youcef Ouinten¹,
Sabine Loudcher² &
…
Philippe Fournier-Viger³

559 Accesses
10 Citations
Explore all metrics

Abstract

Text mining approaches are commonly used to discover relevant information and relationships in huge amounts of text data. The term data mining refers to methods for analyzing data with the objective of finding patterns that aggregate the main properties of the data. The merger between the data mining approaches and on-line analytical processing (OLAP) tools allows us to refine techniques used in textual aggregation. In this paper, we propose a novel aggregation function for textual data based on the discovery of frequent closed patterns in a generated documents/keywords matrix. Our contribution aims at using a data mining technique, mainly a closed pattern mining algorithm, to aggregate keywords. An experimental study on a real corpus of more than 700 scientific papers collected on Microsoft Academic Search shows that the proposed algorithm largely outperforms four state-of-the-art textual aggregation methods in terms of recall, precision, F-measure and runtime.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Notes

References

Frawley W, Piatetsky-Shapiro G, Matheus C (1992) Knowledge discovery in databases: an overview. AI Magazine, fall 1992, pp 213–228
Palmerini P (2004) On performance of data mining: from algorithms to management systems for data exploration (Doctoral dissertation, PhD. Thesis: TD-2004-2, Universita CaFos- cari di Venezia)
Han J, Fu Y (1996) Attribute-oriented induction in data mining, advances in knowledge discovery and data mining. AAAI Press/The MIT Press, pp 399–421
Han J, Kamber M (2001) Data mining: concepts and techniques. Morgan Kaufman, San Mateo. ISBN: 1-55860489-8
MATH Google Scholar
Chen SY, Liu X (2005) Data mining from 1994 to 2004: an application-oriented review. Int J Business Intell Data Min 1(1):4–11
Article Google Scholar
Baeza-Yates R, Moffat A, Navarro G (2002) Searching large text collections. Handbook of massive data sets, pp 195–243. ISBN:1- 4020-0489-3
Navathe S, Ramez E (2000) Fundamentals of database systems. Pearson Education, Singapore
MATH Google Scholar
Wu S-T, Li Y, Xu Y (2006) Deploying approaches for pattern refinement in text mining. In: Proceedings of the sixth IEEE international conference on data mining, pp 1157–1161
Gupta V, Gurpreet SL (2009) A survey of text mining techniques and applications. J Emerg Technol Web Intell 1(1):60–76
Google Scholar
Delgado M, Martín-Bautista MJ, Sánchez D, Vila MA (2002) Mining text data: special features and patterns. In: Proceedings of EPS exploratory workshop on pattern detection and discovery in data mining, London
Kodratoff Y (1999) Knowledge discovery in texts: a definition and applications. In: Proceedings of the 11th international symposium on foundations of intelligent systems, pp 16–29
Tan A-H (1999) Text mining: the state of the art and the challenges. In: Proceedings of the PAKDD workshop on knowledge discovery from advanced databases, pp 65–70
Nasukawa T, Nagano T (2001) Text analysis and knowledge mining system. IBM Syst J 40(4):967–984
Article Google Scholar
Mooney RJ, Bunescu R (2005) Mining knowledge from text using information extraction. ACM SIGKDD Explor Newsl 7(1):3–10
Article Google Scholar
Leopold E, Kindermann J (2002) Text categorization with support vector machines. How represent texts in input space? Mach Learn 46:423–444
Article MATH Google Scholar
Xu X, Mete M, Yuruk N (2005) Mining concept associations for knowledge discovery in large textual databases. In: Proceedings of the 2005 ACM symposium on applied computing. ACM, pp 549–550. ISO 690
Mahgoub H (2006) Mining Association rules from unstructured documents. World Acad Sci Eng Technol 20 (1):1–6
Google Scholar
Poudat C, Cleuziou G, Clavier V (2006) Catgorisation de textes en domaines et genres. Doc Numrique 9(1):61–76
Article Google Scholar
Bouakkaz M, Loudcher S, Ouinten Y (2016) OLAP textual aggregation approach using the Google similarity distance. Int J Bus Intell Data Min 11(1):31–48
Article Google Scholar
Ravat F, Teste O, Tournier R (2007) OLAP aggregation function for textual data warehouse. In: International conference on enterprise information systems, pp 151–156
Oukid L, Asfari O, Bentayeb F, Benblidia N, Boussaid O (2013) CXT-cube: contextual text cube model and aggregation operator for text OLAP. In: Proceedings of the sixteenth international workshop on Data warehousing and OLAP. ACM, pp 27–32
Mukherjee S, Joshi S (2014) Author-specific sentiment aggregation for polarity prediction of reviews. In: Ninth international conference on language resources and evaluation. ELRA, pp 3092–3099
Ravi K, Ravi V (2015) A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowl-Based Syst 89:14–46
Article Google Scholar
Mihalcea R, Tarau P (2004) TextRank: bringing order into texts. Association for Computational Linguistics
Blei D M, Lafferty J D (2006) Dynamic topic models. In: Proceedings of the 23rd international conference on Machine learning. ACM, pp 113–120
Bouakkaz M, Loudcher S, Ouinten Y (2014) Automatic textual aggregation approach of scientific articles in OLAP context. In: 2014 10th international conference on innovations in information technology (INNOVATIONS). IEEE, pp 30–35
Oyedotun O K, Khashman A (2016) Document segmentation using textural features summarization and feedforward neural network. Appl Intell 45(1):198–212
Article Google Scholar
Hossain M M, Prybutok V R (2016) Towards developing a business performance management model using causal latent semantic analysis. Int J Bus Perform Manag 17(2):161–183
Article Google Scholar
Lauw H W, Lim E P, Pang H (2007) Discovering documentary evidence of associations among entities. In: Proceedings of the 2007 ACM symposium on applied computing. ACM, pp 824–828
Bringay S, Bchet N, Bouillot F, Poncelet P, Roche M, Teisseire M (2011) Towards an on-line analysis of tweets processing. In: Database and expert systems applications. Springer, Berlin, pp 154– 161
Wartena C, Brussee R (2008) Topic detection by clustering keywords. In: 19th international workshop on database and expert systems application. DEXA’08. IEEE, pp 54–58
Fuglede B, Topsoe F (2004) Jensen-Shannon divergence and Hilbert space embedding. In: IEEE international symposium on information theory, pp 31–31
Ravat F, Teste O, Tournier R, Zurfluh G (2008) Top Keyword: an aggregation function for textual document OLAP. In: Data warehousing and knowledge discovery. Springer, Berlin, pp 55–64
Li J, Li L, Li T (2012) Multi-document summarization via submodularity. Appl Intell 37(3):420–430
Article Google Scholar
Frantzi K, Ananiadou S, Mima H (2000) Automatic recognition of multi-word terms: the c-value/nc-value method. Int J Digit Libr 3(2):115–130
Article Google Scholar
El-Ghannam F, El-Shishtawy T (2014) Multi-topic multi-document summarizer. arXiv preprint arXiv:1401.0640
Grahne G, Zhu J (2005) Fast algorithms for frequent itemset mining using fp-trees. IEEE Trans Knowl Data Eng 17(10):1347– 1362
Article Google Scholar
Fournier-Viger P, Gomariz A, Gueniche T, Soltani A, Wu C W, Tseng V S (2014) SPMF: a Java open-source pattern mining library. J Mach Learn Res 15(1):3389–3393
MATH Google Scholar
Wang T, Chen P, Simovici D (2016) A new evaluation measure using compression dissimilarity on text summarization. Appl Intell 45(1):127–134
Article Google Scholar
Sutcliffe T (1992) Measuring the informativeness of a retrieval process. In: Proceedings of SIGIR, pp 23–36
Jones K, Willett P (1997) Readings in information retrieval. Morgan Kaufmann, San Mateo
Google Scholar
Trec: common evaluation measures. The twenty-second Text REtrieval conference. http://trec.nist.gov/pubs/trec22/trec2015.html (2015)

Download references

Author information

Authors and Affiliations

LIM Laboratory, Laghouat University, Laghouat, Algeria
Mustapha Bouakkaz & Youcef Ouinten
ERIC Laboratory, Lyon 2 University, Lyon, France
Sabine Loudcher
Harbin Institute of Technology Shenzhen, Shenzhen, China
Philippe Fournier-Viger

Authors

Mustapha Bouakkaz
View author publications
You can also search for this author in PubMed Google Scholar
Youcef Ouinten
View author publications
You can also search for this author in PubMed Google Scholar
Sabine Loudcher
View author publications
You can also search for this author in PubMed Google Scholar
Philippe Fournier-Viger
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mustapha Bouakkaz.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Bouakkaz, M., Ouinten, Y., Loudcher, S. et al. Efficiently mining frequent itemsets applied for textual aggregation. Appl Intell 48, 1013–1019 (2018). https://doi.org/10.1007/s10489-017-1050-9

Download citation

Published: 11 August 2017
Issue Date: April 2018
DOI: https://doi.org/10.1007/s10489-017-1050-9

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Efficiently mining frequent itemsets applied for textual aggregation

Abstract

Access this article

Similar content being viewed by others

Evaluating Top-K Approximate Patterns via Text Clustering

Techniques, Applications, and Issues in Mining Large-Scale Text Databases

Data Mining, Management and Visualization in Large Scientific Corpuses

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Efficiently mining frequent itemsets applied for textual aggregation

Abstract

Access this article

Similar content being viewed by others

Evaluating Top-K Approximate Patterns via Text Clustering

Techniques, Applications, and Issues in Mining Large-Scale Text Databases

Data Mining, Management and Visualization in Large Scientific Corpuses

Notes

References

Author information

Authors and Affiliations

Corresponding author

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation