Abstract
Synonymy has been of high importance in information retrieval and automatic indexing. Recently, in the view of special needs for domain ontology building and maintenance, the problem returns with a higher demand. In the presented paper, we present a novel text mining approach to discovering synonyms or close meaning terms. The offered measures of closeness of terms (or their contexts) are expressed by means of data mining notions; namely, frequent termsets and association rules. The measures can be calculated by using data mining techniques, such as the well known Apriori algorithm. The approach is domain-independent and large-scale. It is, however, restricted to the recognition of parts of speech. In that sense the approach is language dependent, up to the language dependency of the parts of speech tagging process. The experimental results obtained with the approach are presented.
Keywords
The work has been performed within the project granted by France Telecom.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Agrawal, R., Srikant, R.: Fast Algorithms for Mining Association Rules. In: Proc. of the 20th Int’l Conf. on Very Large Databases, Santiago, pp. 487–499. Morgan Kaufmann, San Francisco (1994)
Ahonen-Myka, H.: Discovery of frequent word sequences in text. In: The ESF Exploratory Workshop on Pattern Detection and Discovery in Data Mining, Imperial College, London (2002)
Baxendal, P.B.: An empirical model for computer indexing. In: Machine Indexing, American U., Washington, DC, pp. 207–218 (1962)
Delgado, M., Martin-Bautista, M.J., Sanchez, D., Amparo Vila Miranda, M.: Mining Text Data: Special Features and Patterns. In: Hand, D.J., Adams, N.M., Bolton, R.J. (eds.) Pattern Detection and Discovery. LNCS (LNAI), vol. 2447, pp. 140–153. Springer, Heidelberg (2002)
General Architecture for Text Engineering. http://gate.ac.uk/projects.html
Grefenstette, G.: Evaluation Techniques for Automatic Semantic Extraction: Comparing Syntatic and Window Based Approaches. In: Boguraev, B., Pustejovsky, J. (eds.) Corpus processing for Lexical Acquisition, pp. 205–216. MIT Press, Cambridge (1995)
Hepple, M.: Independence and commitment: Assumptions for rapid training and execution of rule-based POS taggers. In: Proc. of the 38th Annual Meeting of the Association for Computational Linguistics (ACL-2000) (2000)
Hotho, A., Maedche, A., Staab, S., Zacharias, V.: On Knowledgeable Unsupervised Text Mining. In: Proc. of the DaimlerChrysler Workshop on Text Mining, Ulm (2002)
Hamon, T., Nazarenko, A., Gros, C.: A step towards the detection of semantic variants of terms in technical documents. In: Proc. of the 36th Ann. meeting of ACL (1998)
Lewis, P.A.W., Baxendale, P.B., Bennett, J.L.: Statistical Discrimination of the Synonymy/Antonymy Relationship Between Words. J. of the ACM 14(1), 20–44 (1967)
Kryszkiewicz, M.: Concise Representation of Frequent Patterns based on Disjunction-Free Generators. In: Proc. of the 2001 IEEE International Conference on Data Mining (ICDM), pp. 305–312. IEEE Computer Society, Los Alamitos (2001)
Kryszkiewicz, M., Gajek, M.: Concise Representation of Frequent Patterns based on Generalized Disjunction-Free Generators. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 159–171. Springer, Heidelberg (2002)
Maedche, A., Staab, S.: Mining Ontologies from Text. In: Dieng, R., Corby, O. (eds.) EKAW 2000. LNCS (LNAI), vol. 1937, pp. 189–202. Springer, Heidelberg (2000)
Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Comm. ACM 8(10), 627–633 (1965)
Stevens, J.S., Husted, T., Cutting, D., Carlson, P.: Apache Lucene Overview (2006) http://lucene.apache.org/java/docs/index.pdf
Velardi, P., Fabriani, P., Missikoff, M.: Using text processing techniques to automatically enrich a domain ontology. In: Proc. of FOIS, pp. 270–284. ACM Press, New York (2001)
Wu, H., Zhou, M.: Optimizing Synonym Extraction Using Monolingual and Bilingual Resources. Ann. Meeting of the ACL. In: Proc. of the 2nd Int’l workshop on Paraphrasing, vol. 16, pp. 72–79 (2003)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Rybinski, H., Kryszkiewicz, M., Protaziuk, G., Jakubowski, A., Delteil, A. (2007). Discovering Synonyms Based on Frequent Termsets. In: Kryszkiewicz, M., Peters, J.F., Rybinski, H., Skowron, A. (eds) Rough Sets and Intelligent Systems Paradigms. RSEISP 2007. Lecture Notes in Computer Science(), vol 4585. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73451-2_54
Download citation
DOI: https://doi.org/10.1007/978-3-540-73451-2_54
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73450-5
Online ISBN: 978-3-540-73451-2
eBook Packages: Computer ScienceComputer Science (R0)