Abstract
This paper proposes a methodology for text mining relying on the classical knowledge discovery loop, with a number of adaptations. First, texts are indexed and prepared to be processed by frequent itemset levelwise search. Association rules are then extracted and interpreted, with respect to a set of quality measures and domain knowledge, under the control of an analyst. The article includes an experimentation on a real-world text corpus holding on molecular biology.
Similar content being viewed by others
References
Anick P, Pustejovsky J (1990) An application of lexical semantics to knowledge acquisition from corpora. In: Proceedings of the 30th international conference on computational linguistics (COLING'90) vol 3 Helsinki pp 7–12
Azé J, Kodratoff Y (2003) Extraction de ``pépites'' de connaissance dans les donnèes : une nouvelle approche et une étude de la sensibilité au bruit. numéro spécial RNTI-2 « Mesures de qualité pour la fouille de données », To appear in french
Azè J, Roche M (2003) Une application de la fouille de textes : l'extraction des régles d'association à partir d'un corpus spécialisé. In: Hacid DBMS, Kodratoff Y (eds) Actes de extraction et gestion des connaissances (EGC'03), vol 17 of RSTI/RIA-ECA , Lyon Hermès éditions, pp 283–294
Bayardo RJ, Agrawal R (1999) Mining the most interesting rules. In: Proceedings of the 5th ACMSIGKDD international conference on knowledge discovery and data mining (KDD'99), pp 145–154
Brill E, Pop M (1999) Unsupervised learning of disambiguation rules for part of speech tagging. In: Ide N, Véronis J (eds), Natural language processing using very large corpora, vol 11 of text, speech and language technology, Kluwer, Dordrecht, 13 p
Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD'97 conference on management of data, vol 36 Tucson, pp 255–264
Cheung D, Han J, Ng V, Wong C (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the 12th IEEE international conference on data engineering (ICDE'96), New Orleans, pp 106–114
Courtine M, Bournaud I (2001) Building a pruned inheritance lattice for relational description. In: Mephu-Nguifo E, Liquière M, Duquenne V (eds), Proceedings of workshop on concept lattices-based theory: methods and tools for knowledge discovery in databases CLKDD'01, Stanford, pp 65–75
Delgado M, Martin-Bautista MJ, Sanchez D, Vila M (2002) Mining text data: special features and patterns. In: Hand D, Adams N, Bolton R (eds), Pattern detection and discovery: proceedings of ESF exploratory workshop, vol 2447 of Lecture Notes in Artificial Intelligence – LNAI. Springer, London, pp 140–153
Diatta J (2003) Génération de la base de guigues-duquenne-Luxenburger pour les régles d'association par une approche utilisant des mesures de similiaritè multivoies. In: Gilleron R (ed) Actes de CAp'03 : Confèrence d'Apprentissage, Laval Dans le cadre de la plate-forme AFIA, Presses Universitaires de Grenoble pp 281–298
Ducloy J, Lamirel JC, Nauer E (1994) A workbench for bibliographical or factual data handling. In: Proceedings of 14th international conference CODATA –- The information revolution: impact on science and technology, Chambéry pp 63–70
Duquenne V (1999) Latticial structures in data analysis. Theor Comput Sci 217:407–436
Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery. AI Magazine, 17(3):37–54
Feldman R, Fresko M, Kinar Y, Lindell Y, Liphstat O, Rajman M, Schler Y, Zamir O (1998) Text mining at the term level. In: Zytkow JM, Quafafou M (eds), Proceedings of the 2nd European symposium on principles of data mining and knowledge discovery (PKDD'98), vol 1510 of Lecture Notes in Artificial Intelligence – LNAI, Nantes pp 65–73
Freitas AA (1998) On objective measures of rule surprisingness. In: Proceedings of the 2nd European symposium on principles of data mining and knowledge discovery (PKDD'98), vol 1510 of Lecture Notes in Artificial Intelligence – LNAI, Nantes France. Springer Berlin Heidelberg New York, pp 1–9
Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer, Berlin Heidelberg New York
Guigues J, Duquenne V (1986) Familles minimales d'implication informatives résultant d'un tableau de données binaires. Math Infor Sci Hum 95(24):5–18
Guillaume S (2000) Traitement des données volumineuses: Mesures et algorithmes d'extraction de règles d'association et régles ordinales. PhD Thesis, Université de Nantes
Guillet F (2004) Mesures de qualité des connaissances en ECD. Lecture at Conf. Extraction et Gestion des Connaissances (EGC'04), Clermont-Ferrand, France
IBM (1998) Intelligent Miner for Text. International Business Machine (IBM)
Jacquemin C (1994) FASTR : A unification-based front-end to automatic indexing. In: Proceedings of information multimedia information retrieval systems and management, Rockfeller University, New York pp 34–47
Janetzko D, Cherfi H, Kennke R, Napoli A, Toussaint Y (2004) Knowledge-based selection of association rules for text mining. In: de Mántaras RL, Saitta L (eds), Proceedings of the 16th European conference on artificial intelligence (ECAI'04), Valencia, Spain. IOS Press, pp 485–489
Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo AI (1994) Finding interesting rules from large sets of discovered association rules. In: Adam BKBNR, Yesha Y (eds), Proceedings of the 3rd international conference on information and knowledge management (CIKM'94), Gaithersburg. ACM Press, New York pp 401–407
Kodratoff Y (1999) Knowledge discovery in texts : a definition, and applications. In: Ras ZW, Skowron A, (eds), Foundations of intelligent systems, proceedings of the 11th international symposium, ISMS'99, vol 1609 of Lecture Notes in Artificial Intelligence – LNAI, Warsaw, Pol. Springer, Berlin Heidelberg New York pp 16–29
Kuntz P, Guillet F, Lehn R, Briand H (2000) A user-driven process for mining association rules. In: Zighed D, Komorowski H, Zytkow J (eds), Proceedings of the 4th European conference on principles of data mining and knowledge discovery (PKDD'00), vol 1910 of Lecture Notes in Artificial Intelligence – LNAI, Lyon. Springer Berlin Heidelberg New York, pp 483–489
Lavrač N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming (ILP'99), vol 1634 of Lecture Notes in Artificial Intelligence – LNAI, Bled, Slovenia. Co-located with ICML'99, Springer Berlin Heidelberg New York, pp 174–185
Li J, Zhang X, Dong G, Ramamohanarao K, Sun Q (1999) Efficient mining of high confidence association rules without support thresholds. In: Zytkow JM, Rauch J (eds), Proceedings of the 3rd European conference on principles of knowledge discovery in databases (PKDD'99), vol 1704 of Lecture Notes in Computer Science – LNCS, Prague. Springer, Berlin Heidelberg New York, pp 406–411
Liu B, Hsu W, Ma Y (1999) Pruning and summarizing the discovered associations. In: Chaudhuri S, Madigan D (eds), Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'99), San Diego. ACM Press, New York pp 125–134
Luxenburger M (1991) Implications partielles dans un contexte. Math Infor Sci Hum 113(29):35–55
Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Infor Sys 24(1):25–46
Piatetsky-Shapiro G, Piatetsky-Shapiro G, Frawley WJ (eds) (1991) Knowledge discovery in databases, chapter discovery, analysis, and presentation of strong rules (chapter 13), AAAI/MIT Press, Menlo Park, pp 229–248
Rajman M, Besançon R (1997) Text mining: natural language techniques and text mining applications. In: Prade H (ed) Proceedings of the 7th IFIP 2.6 working conference on database semantics (DS-7), Leysin (Switzerland). Chapman & Hall, London, 15 p
Sahar S (1999) Interestingness via what is not interesting. In: Chaudhuri S, Madigan D (eds) Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'99), San Diego. ACM Press, New York PP 332–336
Stumme G, Taouil R, Bastide Y, Pasquier N, Lakhal L (2001) Intelligent structuring and reducing of association rules with formal concept analysis. In: Baader F, Brewker G, Eiter T (eds) Proceedings of advances in artificial intelligence (KI'01): joint German/Austrian conference on AI, vol 2174 of Lecture Notes in Artificial Intelligence – LNAI, Vienna. Springer, Berlin Heidelberg New York, pp 335–350
Stumme G, Taouil R, Bastide Y, Pasquier N, Lakhal L (2002) Computing iceberg concept lattices with titanic. J Data Knowl Eng 42(2):189–222
Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for associaton patterns. In: Proceedings of the 8th ACM international conference on knowledge discovery and data mining (KDD'02), Edmonton, Canada. ACM Press, New York, pp 183–193
Toussaint Y, Simon A (2000) Building and interpreting term dependencies using association rules extracted from galois lattices. In: Proceedings of content-based multimedia information access (RIAO'00), vol 2, Paris, pp 1686–1693
Zaki MJ (2000) Generating non-redundant association rules. In: Proceedings of the 6th international conference on knowledge discovery and data mining (KDD'00), Boston. ACM Press, New York, pp 34–43
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Cherfi, H., Napoli, A. & Toussaint, Y. Towards a text mining methodology using association rule extraction. Soft Comput 10, 431–441 (2006). https://doi.org/10.1007/s00500-005-0504-x
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00500-005-0504-x