Parameters Driving Effectiveness of LSA on Topic Segmentation

Naili, Marwa; Habacha, Anja Chaibi; Ben Ghezala, Henda Hajjami

doi:10.1007/978-3-319-75477-2_40

Marwa Naili¹⁴,
Anja Chaibi Habacha¹⁴ &
Henda Hajjami Ben Ghezala¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNTCS,volume 9623))

Included in the following conference series:

International Conference on Intelligent Text Processing and Computational Linguistics

1367 Accesses

Abstract

Latent Semantic Analysis (LSA) is an efficient statistical technique for extracting semantic knowledge from large corpora. One of the major problems of this technique is the identification of the most efficient parameters of LSA and the best combination between them. Therefore, in this paper, we propose a new topic segmenter to study in depth the different parameters of LSA for the topic segmentation. Thus, the aim of this study is to analyze the effect of these different parameters on the quality of topic segmentation and to identify the most efficient parameters. Based on extensive experiments, we showed that the choice of LSA parameters is very sensitive and it has an impact on the quality of topic segmentation. More important, according to this study, we are able to propose appropriate recommendation for the selection of parameters in the field of topic segmentation.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
ACM home. http://dl.acm.org/.

References

Bechet, N., Chauche, J., Roche, M.: EXPLSA an approach based on syntactic knowledge in order to improve LSA for conceptual classification task. In: CICLing 2008, vol. 33, pp. 213–224 (2008)
Google Scholar
Bestgen, Y.: Improving text segmentation using latent semantic analysis: a reanalysis of choi, wiemer-hastings and moore. Computat. Linguist. 32, 5–12 (2006)
Article Google Scholar
Bestgen, Y., Pierard, S.: Comment evaluer les algorithmes de segmentation thematique? essai de construction d’un mmateriel de reference. In: TALAN 2006, pp. 407–414 (2006)
Google Scholar
Choi, F.Y.Y.: Advances in domain independent linear text segmentation. In: Proceedings of NAACL, pp. 26–33 (2000)
Google Scholar
Choi, F.Y.Y., Wiemer-Hastings, P., Moore, J.: Latent semantic analysis for text segmentation. In: Proceedings of EMNLP, pp. 109–117 (2001)
Google Scholar
Deerwester, S., Dumais, S.T., Furnas, G.W., Landauer, T.K., Harshman, R.: Indexing by latent semantic analysis. J. Am. Soc. Inf. Sci. 41, 391–407 (1990)
Article Google Scholar
Dumais, S.: Enhancing performance in latent semantic indexing (LSI) retrieval. Technical Report TM-ARH017527, Bellcore, Morristown, NJ (1992)
Google Scholar
Ferret, O.: Improving text segmentation by combining endogenous and exogenous methods. In: International Conference RANLP, Borovets, Bulgaria, pp. 88–93 (2009)
Google Scholar
Guillermo, J.B., Jose, A.L., Ricardo, O., Inmaculada, E.: Latent semantic analysis parameters for essay evaluation using small-scale corpora. J. Quant. Linguist. 17(1), 1–29 (2010)
Article Google Scholar
Habacha, A.C., Naili, M., Sammoud, S.: Topic segmentation for textual document written in arabic language. Procedia Comput. Sci. 35, 437–446 (2014). KES-2014 Gdynia, Poland, September 2014
Article Google Scholar
Hearst, M.A.: Texttiling: segmenting text into multi-paragraph subtopic passages. Comput. Linguist. 23(1), 33–64 (1997)
Google Scholar
Kundu, A., Jain, V., Kumar, S., Chandra, C.: A journey from normative to behavioral operations in supply chain management: a review using latent semantic analysis. Expert Syst. Appl. 42(2), 796–809 (2015)
Article Google Scholar
Labadié, A., Prince, V.: Lexical and semantic methods in inner text topic segmentation: a comparison between C99 and transeg. In: Kapetanios, E., Sugumaran, V., Spiliopoulou, M. (eds.) NLDB 2008. LNCS, vol. 5039, pp. 347–349. Springer, Heidelberg (2008). https://doi.org/10.1007/978-3-540-69858-6_40
Chapter Google Scholar
Lafourcade, M., Zampa, V.: PTICLIC: a game for vocabulary assessment combining JEUXDEMOTS and LSA. In: CICLing 2009, pp. 1–7 (2009)
Google Scholar
Lintean, M., Moldovan, C., Rus, V., McNamara, D.: The role of local and global weighting in assessing the semantic similarity of texts using latent semantic analysis. In: FLAIRS Conference, pp. 235–240 (2010)
Google Scholar
Misra, H., Yvon, F., Jose, J.M., Cappe, O.: Text segmentation via topic modeling: an analytical study. In: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pp. 1553–1556 (2009)
Google Scholar
Nakov, P., Popova, A., Mateev, P.: Weight functions impact on LSA performance. In: Recent Advances in Natural Language Processing - RANLP 2001, Tzigov Chark, Bulgaria (2001)
Google Scholar
Nakov, P., Valchanova, E., Angelova, G.: Towards deeper understanding of the LSA performance. In: Recent Advances in Natural Language Processing - RANLP 2003 (2003)
Google Scholar
Poria, S., Agarwal, B., Gelbukh, A., Hussain, A., Howard, N.: Dependency-based semantic parsing for concept-level text analysis. In: Gelbukh, A. (ed.) CICLing 2014. LNCS, vol. 8403, pp. 113–127. Springer, Heidelberg (2014). https://doi.org/10.1007/978-3-642-54906-9_10
Chapter Google Scholar
Reynar, J.C.: Topic Segmentation : Algorithms and Applications. Ph.D. thesis, University of Pennsylvania (1998)
Google Scholar
Wiemer-Hastings, P., Wiemer-Hastings, K., Graesser, A.: How latent is latent semantic analysis? In: Proceedings of the Sixteenth International Joint Congress on Artificial Intelligence, pp. 932–937. Morgan Kaufmann, San Francisco (1999)
Google Scholar
Wild, F., Haley, D., Bulow, K.: Using latent-semantic analysis and network analysis for monitoring conceptual development. JLCL 26, 9–21 (2011)
Google Scholar
Wild, F., Stahl, C., Stermsek, G., Yoseba, K.P., Neumann, G.: Factors influencing effectiveness in automated essay scoring with LSA. In: AIED 2005, pp. 947–949. The Netherlands, Amsterdam (2005)
Google Scholar

Download references

Acknowledgments

We would like to show our special gratitude to professor emeritus Mouhamed Ben Ahmed, whose has invested his effort in guiding this research and contributed with his precious suggestions, support and encouragement.

Author information

Authors and Affiliations

RIADI laboratory, National School of computer Science (ENSI), University of Manouba, 2010, Manouba, Tunisia
Marwa Naili, Anja Chaibi Habacha & Henda Hajjami Ben Ghezala

Authors

Marwa Naili
View author publications
You can also search for this author in PubMed Google Scholar
Anja Chaibi Habacha
View author publications
You can also search for this author in PubMed Google Scholar
Henda Hajjami Ben Ghezala
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Anja Chaibi Habacha .

Editor information

Editors and Affiliations

CIC, Instituto Politécnico Nacional, Mexico City, Mexico
Alexander Gelbukh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Naili, M., Habacha, A.C., Ben Ghezala, H.H. (2018). Parameters Driving Effectiveness of LSA on Topic Segmentation. In: Gelbukh, A. (eds) Computational Linguistics and Intelligent Text Processing. CICLing 2016. Lecture Notes in Computer Science(), vol 9623. Springer, Cham. https://doi.org/10.1007/978-3-319-75477-2_40

Download citation

DOI: https://doi.org/10.1007/978-3-319-75477-2_40
Published: 21 March 2018
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-75476-5
Online ISBN: 978-3-319-75477-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics