ABSTRACT
This paper presents an investigation about concepts extraction from texts using clustering algorithms. We applied a hybrid approach to select feature candidates and the CLUTO tool to support the process of clustering of terms. The analysis of identified concepts was manual. The details and preliminaries results of this approach for portuguese texts are discussed.
- Azeredo, S., Moraes, S. M. W., and Strube de Lima, V. L. Keywords, k-NN and Neural Networks: a Support for Hierarchical Categorization of Texts in Brazilian Portuguese. In: 6th International Language Resources and Evaluation (LREC'08), Marrakech, may 28--30. European Language Resources Association (ELRA), Morocco, 2008.Google Scholar
- Bang, S. L, Yang, J. D and Yang, H. J. Hierarchical Document Categorization with k-NN and concept-based thesauri. Information Processing and Management, N° 42, Elsevier, 2006, pp. 387--406. Google ScholarDigital Library
- Bloehdorn, S., Cimiano P. and Hotho, A. Learning Ontologies to Improve Text clustering and Classification. In: 29th Annual Conference of the German Classification Society (GfKl 2005): From Data and Information Analysis to Knowledge Engineering, Magdeburg, Germany, March 9--11, 2005. Studies in Classification, Data Analysis, and Knowledge Organization, 30, Springer, pp. 334--341, February 2006.Google Scholar
- Butters, J. and Ciravegna, F. Using Similarity Metrics for Terminology Recognition. In; 6th International Language Resources and Evaluation (LREC'08), Marrakech, may 28--30. European Language Resources Association (ELRA), Morocco, 2008.Google Scholar
- Edmonds, A. Using conceptual structures for efficient document comparison and location. In: IEEE Symposium on Computational Intelligence and Data Mining (CIDM 2007), Honolulu, April 1--5. IEEE Symposium Series on Computational Intelligence 2007, Hawaii, USA, 2007, pp 238--242.Google ScholarCross Ref
- Frantzi, K. T. and Ananiadou, S. The C/NC value domain independent method for multi-word term extraction. Journal of Natural Language Processing 6, 3, 1999, 145--179.Google ScholarCross Ref
- Gamallo, P., Lopes, G. P. and Agustini, A. Inducing Classes of Terms from Text. In: 10th International Conference Text, Speech and Dialogue (TSD 2007), Pilsen, Czech Republic, September 3--7. Lecture Notes in Computer Science, 4649, Springer, 2007, pp. 31--38. Google ScholarDigital Library
- Gonzalez, M. A. I and Strube de Lima, V. L. Tools for Normalization: An Alternative for Lexical Normalization. In: International Conference on Computational Processing of Portuguese, E. Vieira et. al (eds): PROPOR 2006, Lecture Notes in Computer Science, 3960, Springer-Verlag, 2007, pp. 100--109. Google ScholarDigital Library
- Grefenstette, G. Evaluation techniques for automatic semantic extraction: comparing syntactic and window based approaches. In Branimir Boguraev and James Pustejovsky (eds), Corpus processing for Lexical Acquisition, MIT Press, USA, 1996, pp. 205--216. Google ScholarDigital Library
- Hindle, D. Noun classification from predicate-argument structures. In: 28th Annual Meeting of the Association of Computational Linguistics, ACL, Pittsburgh, Pennsylvania, USA, 1990, pp. 268--275. Google ScholarDigital Library
- Karypis, G. CLUTO: A clustering Toolkit. University of Minnesota, Department of Computer Science, Minneapolis, Technical Report 02-017. Available from http://glaros.dtc.umn.edu/gkhome/fetch/sw/cluto/manual.pdf (2003), accessed june 2008.Google Scholar
- Moraes, S. M. W. e Strube de Lima, V. L. Um Estudo sobre Categorização Hierárquica de uma Grande Coleção de Textos em Língua Portuguesa. In: V Workshop em Tecnologia da Informação e Linguagem Humana, XXVII Congresso da SBC, 5--6 julho, SBC, Rio de Janeiro, 2007.Google Scholar
- Salton, G. Introduction to Modern Information Retrieval. New York: McGraw-Hill, 1983. Google ScholarDigital Library
- Spasic, I., Nenadic, G. and Ananiadou, S. Using Domain-Specific Verbs for Term Classification. In: Workshop on Natural Language Processing in Biomedicine, Sapporo, Japan, ACL, 2003, pp. 17--24. Google ScholarDigital Library
- Sung, S., Chung, S. and McLeod, D. Efficient Concept clustering for Ontology Learning using an Event Life Cycle on the Web. In: ACM Symposium on Applied Computing (SAC), Fortaleza, Ceara, Brazil, March 16--20, 2008, pp. 2310--2314. Google ScholarDigital Library
- Vilares, J., Barcala, F. M. and Alonso, M. A. Using Syntactic dependency-pairs conflation to improve retrieval performance in Spanish. In: Internacional Conference on Computational Linguistics and Intelligent Text Processing, CICLing 2002, Mexico, February 17--23. Lectures Notes in Computer Science, 2276, Springer-Verlag, 2002, pp. 381--390. Google ScholarDigital Library
- Yang, H. and Callan, J. Ontology Generation for Large Email Collections. In: 9th Annual International Conference on Digital Government Research, Partnerships for Public Innovation, DG.O 2008, Montreal, Canada, May 18--21. ACM International Conference Proceeding Series, 289, Digital Government Research Center, 2008, pp. 254--261. Google ScholarDigital Library
- Zhang, Z., Iria, J., Brewster, C. and Ciravegna, F. A Comparative Evaluation of Term Recognition Algorithms, In: 6th International Language Resources and Evaluation (LREC'08), Marrakech, may 28--30. European Language Resources Association (ELRA), Morocco, 2008.Google Scholar
Index Terms
- Abordagem não supervisionada para extração de conceitos a partir de textos
Comments