Multilingual Documents Clustering Based on Closed Concepts Mining

Chebel, Mohamed; Latiri, Chiraz; Gaussier, Eric

doi:10.1007/978-3-319-22849-5_36

Mohamed Chebel¹⁸,
Chiraz Latiri¹⁸ &
Eric Gaussier¹⁹

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 9261))

Included in the following conference series:

1193 Accesses
2 Citations

Abstract

The scarcity of bilingual and multilingual parallel corpora has prompted many researchers to accentuate the need for new methods to enhance the quality of comparable corpora. In this paper, we highlight the interest and usefulness of Formal Concept Analysis in multiligual document clustering to improve corpora comparability. We propose a statistical approach for clustering multiligual documents based on multilingual Closed Concepts Mining to partition the documents belonging to one or more collections, writing in more than one language, in a set of classes. Experimental evaluation was conducted on two collections and showed a significant improvement of comparability of the generated classes.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
In this paper, we denote by |X| the cardinality of the set X.
2.
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger/.
3.
https://translate.google.com/.
4.
http://www.lemurproject.org/.

References

Chen, H.-H., Lin, M.-S., Wei, Y.-C.: Novel association measures using web search with double checking. ACL-44: Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics, pp. 1009–1016 (2006)
Google Scholar
Evans, D., Klavans, J.: A platform for multilingual news summarization. Technical Report, Department of Computer Science, Columbia University (2003)
Google Scholar
Ganter, B., Wille, R.: Formal Concept Analysis. Springer, Heidelberg (1999)
Book MATH Google Scholar
Gliozzo A., Strapparava C.: Cross language text categorization by acquiring multi-lingual domain models from comparable corpora. ParaText 2005: Proceedings of the ACL Workshop on Building and Using Parallel Texts (2005)
Google Scholar
Mimouni, N., Nazarenko, A., S. Salotti: Classification conceptuelle d’une collection documentaire, intertextualité et recherche d’information. CORIA 2012: 9th French Information Retrieval Conference. Bordeaux, France (2012)
Google Scholar
Montalvo, S., Martínez, R., Casillas, A., Fresno, V.: Multilingual news document clustering: two algorithms based on cognate named entities. In: Sojka, P., Kopeček, I., Pala, K. (eds.) TSD 2006. LNCS (LNAI), vol. 4188, pp. 165–172. Springer, Heidelberg (2006)
Chapter Google Scholar
Pasquier, N., Bastide, Y., Taouil, R., Stumme, G., Lakhal, L.: Generating a condensed representation for association rules. J. Intell. Inf. Syst. 24(1), 2560 (2005)
Article Google Scholar
Peters C.: Result of the CLEF 2003 cross-language system evaluation campaign. In: Notes for the CLEF 2003 Workshop, 21–22 August, Trondheim, Norway (2003)
Google Scholar
Salton, G., Buckely, C.: Term weighting approaches in automatic text retrieval. Inf. Process. Manage. 24(5), 513–523 (1988)
Article Google Scholar
Romeo, S., Ienco, D., Tagarelli, A.: Knowledge-based representation for transductive multilingual document classification. In: Hanbury, A., Kazai, G., Rauber, A., Fuhr, N. (eds.) ECIR 2015. LNCS, vol. 9022, pp. 92–103. Springer, Heidelberg (2015)
Google Scholar
Wei, C.-P., Yang, C.-C., Lin, C.-M.: A latent semantic indexing-based approach to multilingual document clustering. Decis. Support. Syst. 45(3), 606–620 (2008)
Article Google Scholar
Zaki, M.-J., Hsiao, C.-J.: Efficient algorithms for mining closed itemsets and their lattice structure. IEEE Trans. Knowl. Data Eng. 17(4), 462–478 (2005)
Article Google Scholar

Download references

Acknowledgements

This work is partially funded by the DGRST-CNRS \(n\circ \) 14/R 1401 Franco-Tunisian project, entitled “Text mining for construction of bilingual lexicons and multilingual information retrieval”

Author information

Authors and Affiliations

Research Laboratory LIPAH, Faculty of Sciences of Tunis, University Tunis El Manar, Tunis, Tunisia
Mohamed Chebel & Chiraz Latiri
Research Laboratory LIG, AMA Group, University Joseph Fourier (Grenoble I), Grenoble, France
Eric Gaussier

Authors

Mohamed Chebel
View author publications
You can also search for this author in PubMed Google Scholar
Chiraz Latiri
View author publications
You can also search for this author in PubMed Google Scholar
Eric Gaussier
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Mohamed Chebel .

Editor information

Editors and Affiliations

Hewlett-Packard Enterprise, Sunnyvale, California, USA
Qiming Chen
Paul Sabatier University, Toulouse, France
Abdelkader Hameurlain
Blaise Pascal University, Aubiere, France
Farouk Toumani
University of Linz, Linz, Austria
Roland Wagner
Universidad Politécnica de Valencia, Valencia, Spain
Hendrik Decker

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Chebel, M., Latiri, C., Gaussier, E. (2015). Multilingual Documents Clustering Based on Closed Concepts Mining . In: Chen, Q., Hameurlain, A., Toumani, F., Wagner, R., Decker, H. (eds) Database and Expert Systems Applications. Globe DEXA 2015 2015. Lecture Notes in Computer Science(), vol 9261. Springer, Cham. https://doi.org/10.1007/978-3-319-22849-5_36

Download citation

DOI: https://doi.org/10.1007/978-3-319-22849-5_36
Published: 11 August 2015
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-22848-8
Online ISBN: 978-3-319-22849-5
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics