Skip to main content
Log in

Towards a text mining methodology using association rule extraction

  • Focus
  • Published:
Soft Computing Aims and scope Submit manuscript

Abstract

This paper proposes a methodology for text mining relying on the classical knowledge discovery loop, with a number of adaptations. First, texts are indexed and prepared to be processed by frequent itemset levelwise search. Association rules are then extracted and interpreted, with respect to a set of quality measures and domain knowledge, under the control of an analyst. The article includes an experimentation on a real-world text corpus holding on molecular biology.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Similar content being viewed by others

References

  1. Anick P, Pustejovsky J (1990) An application of lexical semantics to knowledge acquisition from corpora. In: Proceedings of the 30th international conference on computational linguistics (COLING'90) vol 3 Helsinki pp 7–12

  2. Azé J, Kodratoff Y (2003) Extraction de ``pépites'' de connaissance dans les donnèes : une nouvelle approche et une étude de la sensibilité au bruit. numéro spécial RNTI-2 « Mesures de qualité pour la fouille de données », To appear in french

  3. Azè J, Roche M (2003) Une application de la fouille de textes : l'extraction des régles d'association à partir d'un corpus spécialisé. In: Hacid DBMS, Kodratoff Y (eds) Actes de extraction et gestion des connaissances (EGC'03), vol 17 of RSTI/RIA-ECA , Lyon Hermès éditions, pp 283–294

  4. Bayardo RJ, Agrawal R (1999) Mining the most interesting rules. In: Proceedings of the 5th ACMSIGKDD international conference on knowledge discovery and data mining (KDD'99), pp 145–154

  5. Brill E, Pop M (1999) Unsupervised learning of disambiguation rules for part of speech tagging. In: Ide N, Véronis J (eds), Natural language processing using very large corpora, vol 11 of text, speech and language technology, Kluwer, Dordrecht, 13 p

  6. Brin S, Motwani R, Ullman J, Tsur S (1997) Dynamic itemset counting and implication rules for market basket data. In: Proceedings of the ACM SIGMOD'97 conference on management of data, vol 36 Tucson, pp 255–264

  7. Cheung D, Han J, Ng V, Wong C (1996) Maintenance of discovered association rules in large databases: an incremental updating technique. In: Proceedings of the 12th IEEE international conference on data engineering (ICDE'96), New Orleans, pp 106–114

  8. Courtine M, Bournaud I (2001) Building a pruned inheritance lattice for relational description. In: Mephu-Nguifo E, Liquière M, Duquenne V (eds), Proceedings of workshop on concept lattices-based theory: methods and tools for knowledge discovery in databases CLKDD'01, Stanford, pp 65–75

  9. Delgado M, Martin-Bautista MJ, Sanchez D, Vila M (2002) Mining text data: special features and patterns. In: Hand D, Adams N, Bolton R (eds), Pattern detection and discovery: proceedings of ESF exploratory workshop, vol 2447 of Lecture Notes in Artificial Intelligence – LNAI. Springer, London, pp 140–153

  10. Diatta J (2003) Génération de la base de guigues-duquenne-Luxenburger pour les régles d'association par une approche utilisant des mesures de similiaritè multivoies. In: Gilleron R (ed) Actes de CAp'03 : Confèrence d'Apprentissage, Laval Dans le cadre de la plate-forme AFIA, Presses Universitaires de Grenoble pp 281–298

  11. Ducloy J, Lamirel JC, Nauer E (1994) A workbench for bibliographical or factual data handling. In: Proceedings of 14th international conference CODATA –- The information revolution: impact on science and technology, Chambéry pp 63–70

  12. Duquenne V (1999) Latticial structures in data analysis. Theor Comput Sci 217:407–436

    Google Scholar 

  13. Fayyad U, Piatetsky-Shapiro G, Smyth P (1996) From data mining to knowledge discovery. AI Magazine, 17(3):37–54

  14. Feldman R, Fresko M, Kinar Y, Lindell Y, Liphstat O, Rajman M, Schler Y, Zamir O (1998) Text mining at the term level. In: Zytkow JM, Quafafou M (eds), Proceedings of the 2nd European symposium on principles of data mining and knowledge discovery (PKDD'98), vol 1510 of Lecture Notes in Artificial Intelligence – LNAI, Nantes pp 65–73

  15. Freitas AA (1998) On objective measures of rule surprisingness. In: Proceedings of the 2nd European symposium on principles of data mining and knowledge discovery (PKDD'98), vol 1510 of Lecture Notes in Artificial Intelligence – LNAI, Nantes France. Springer Berlin Heidelberg New York, pp 1–9

  16. Ganter B, Wille R (1999) Formal concept analysis: mathematical foundations. Springer, Berlin Heidelberg New York

  17. Guigues J, Duquenne V (1986) Familles minimales d'implication informatives résultant d'un tableau de données binaires. Math Infor Sci Hum 95(24):5–18

    Google Scholar 

  18. Guillaume S (2000) Traitement des données volumineuses: Mesures et algorithmes d'extraction de règles d'association et régles ordinales. PhD Thesis, Université de Nantes

  19. Guillet F (2004) Mesures de qualité des connaissances en ECD. Lecture at Conf. Extraction et Gestion des Connaissances (EGC'04), Clermont-Ferrand, France

  20. IBM (1998) Intelligent Miner for Text. International Business Machine (IBM)

  21. Jacquemin C (1994) FASTR : A unification-based front-end to automatic indexing. In: Proceedings of information multimedia information retrieval systems and management, Rockfeller University, New York pp 34–47

  22. Janetzko D, Cherfi H, Kennke R, Napoli A, Toussaint Y (2004) Knowledge-based selection of association rules for text mining. In: de Mántaras RL, Saitta L (eds), Proceedings of the 16th European conference on artificial intelligence (ECAI'04), Valencia, Spain. IOS Press, pp 485–489

  23. Klemettinen M, Mannila H, Ronkainen P, Toivonen H, Verkamo AI (1994) Finding interesting rules from large sets of discovered association rules. In: Adam BKBNR, Yesha Y (eds), Proceedings of the 3rd international conference on information and knowledge management (CIKM'94), Gaithersburg. ACM Press, New York pp 401–407

  24. Kodratoff Y (1999) Knowledge discovery in texts : a definition, and applications. In: Ras ZW, Skowron A, (eds), Foundations of intelligent systems, proceedings of the 11th international symposium, ISMS'99, vol 1609 of Lecture Notes in Artificial Intelligence – LNAI, Warsaw, Pol. Springer, Berlin Heidelberg New York pp 16–29

  25. Kuntz P, Guillet F, Lehn R, Briand H (2000) A user-driven process for mining association rules. In: Zighed D, Komorowski H, Zytkow J (eds), Proceedings of the 4th European conference on principles of data mining and knowledge discovery (PKDD'00), vol 1910 of Lecture Notes in Artificial Intelligence – LNAI, Lyon. Springer Berlin Heidelberg New York, pp 483–489

  26. Lavrač N, Flach P, Zupan B (1999) Rule evaluation measures: a unifying view. In: Proceedings of the 9th international workshop on inductive logic programming (ILP'99), vol 1634 of Lecture Notes in Artificial Intelligence – LNAI, Bled, Slovenia. Co-located with ICML'99, Springer Berlin Heidelberg New York, pp 174–185

  27. Li J, Zhang X, Dong G, Ramamohanarao K, Sun Q (1999) Efficient mining of high confidence association rules without support thresholds. In: Zytkow JM, Rauch J (eds), Proceedings of the 3rd European conference on principles of knowledge discovery in databases (PKDD'99), vol 1704 of Lecture Notes in Computer Science – LNCS, Prague. Springer, Berlin Heidelberg New York, pp 406–411

  28. Liu B, Hsu W, Ma Y (1999) Pruning and summarizing the discovered associations. In: Chaudhuri S, Madigan D (eds), Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'99), San Diego. ACM Press, New York pp 125–134

  29. Luxenburger M (1991) Implications partielles dans un contexte. Math Infor Sci Hum 113(29):35–55

    Google Scholar 

  30. Pasquier N, Bastide Y, Taouil R, Lakhal L (1999) Efficient mining of association rules using closed itemset lattices. Infor Sys 24(1):25–46

    Google Scholar 

  31. Piatetsky-Shapiro G, Piatetsky-Shapiro G, Frawley WJ (eds) (1991) Knowledge discovery in databases, chapter discovery, analysis, and presentation of strong rules (chapter 13), AAAI/MIT Press, Menlo Park, pp 229–248

  32. Rajman M, Besançon R (1997) Text mining: natural language techniques and text mining applications. In: Prade H (ed) Proceedings of the 7th IFIP 2.6 working conference on database semantics (DS-7), Leysin (Switzerland). Chapman & Hall, London, 15 p

  33. Sahar S (1999) Interestingness via what is not interesting. In: Chaudhuri S, Madigan D (eds) Proceedings of the 5th ACM SIGKDD international conference on knowledge discovery and data mining (KDD'99), San Diego. ACM Press, New York PP 332–336

  34. Stumme G, Taouil R, Bastide Y, Pasquier N, Lakhal L (2001) Intelligent structuring and reducing of association rules with formal concept analysis. In: Baader F, Brewker G, Eiter T (eds) Proceedings of advances in artificial intelligence (KI'01): joint German/Austrian conference on AI, vol 2174 of Lecture Notes in Artificial Intelligence – LNAI, Vienna. Springer, Berlin Heidelberg New York, pp 335–350

  35. Stumme G, Taouil R, Bastide Y, Pasquier N, Lakhal L (2002) Computing iceberg concept lattices with titanic. J Data Knowl Eng 42(2):189–222

    Google Scholar 

  36. Tan P, Kumar V, Srivastava J (2002) Selecting the right interestingness measure for associaton patterns. In: Proceedings of the 8th ACM international conference on knowledge discovery and data mining (KDD'02), Edmonton, Canada. ACM Press, New York, pp 183–193

  37. Toussaint Y, Simon A (2000) Building and interpreting term dependencies using association rules extracted from galois lattices. In: Proceedings of content-based multimedia information access (RIAO'00), vol 2, Paris, pp 1686–1693

  38. Zaki MJ (2000) Generating non-redundant association rules. In: Proceedings of the 6th international conference on knowledge discovery and data mining (KDD'00), Boston. ACM Press, New York, pp 34–43

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to H. Cherfi.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Cherfi, H., Napoli, A. & Toussaint, Y. Towards a text mining methodology using association rule extraction. Soft Comput 10, 431–441 (2006). https://doi.org/10.1007/s00500-005-0504-x

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s00500-005-0504-x

Keywords

Navigation