Abstract
The scarcity of bilingual and multilingual parallel corpora has prompted many researchers to accentuate the need for new methods to enhance the quality of comparable corpora. In this paper, we highlight the interest and usefulness of Formal Concept Analysis in multiligual document clustering to improve corpora comparability. We propose a statistical approach for clustering multiligual documents based on multilingual Closed Concepts Mining to partition the documents belonging to one or more collections, writing in more than one language, in a set of classes. Experimental evaluation was conducted on two collections and showed a significant improvement of comparability of the generated classes.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
In this paper, we denote by |X| the cardinality of the set X.
- 2.
- 3.
- 4.
References
Chen, H.-H., Lin, M.-S., Wei, Y.-C.: Novel association measures using web search with double checking. ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–1016 (2006)
Evans, D., Klavans, J.: A platform for multilingual news summarization. Technical Report, Department of Computer Science, Columbia University (2003)
Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999)
Gliozzo A., Strapparava C.: Cross language text categorization by acquiring multi-lingual domain models from comparable corpora. ParaText 2005: Proceedings of the ACL Workshop on Building and Using Parallel Texts (2005)
Mimouni, N., Nazarenko, A., S. Salotti: Classification conceptuelle d’une collection documentaire, intertextualité et recherche d’information. CORIA 2012: 9th French Information Retrieval Conference. Bordeaux, France (2012)
Montalvo, S., Martínez, R., Casillas, A., Fresno, V.: Multilingual news document clustering: two algorithms based on cognate named entities. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 165–172. Springer, Heidelberg (2006)
Pasquier, N., Bastide, Y., Taouil, R., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. J. Intell. Inf. Syst. 24(1), 2560 (2005)
Peters C.: Result of the CLEF 2003 cross-language system evaluation campaign. In: Notes for the CLEF 2003 Workshop, 21–22 August, Trondheim, Norway (2003)
Salton, G., Buckely, C.: Term weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Romeo, S., Ienco, D., Tagarelli, A.: Knowledge-based representation for transductive multilingual document classification. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 92–103. Springer, Heidelberg (2015)
Wei, C.-P., Yang, C.-C., Lin, C.-M.: A latent semantic indexing-based approach to multilingual document clustering. Decis. Support. Syst. 45(3), 606–620 (2008)
Zaki, M.-J., Hsiao, C.-J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)
Acknowledgements
This work is partially funded by the DGRST-CNRS \(n\circ \) 14/R 1401 Franco-Tunisian project, entitled “Text mining for construction of bilingual lexicons and multilingual information retrieval”
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2015 Springer International Publishing Switzerland
About this paper
Cite this paper
Chebel, M., Latiri, C., Gaussier, E. (2015). Multilingual Documents Clustering Based on Closed Concepts Mining . In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_36
Download citation
DOI: https://doi.org/10.1007/978-3-319-22849-5_36
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22848-8
Online ISBN: 978-3-319-22849-5
eBook Packages: Computer ScienceComputer Science (R0)