Skip to main content

Mining Rough Association from Text Documents for Web Information Gathering

  • Chapter

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 4400))

Abstract

It is a big challenge to guarantee the quality of association rules in some application areas (e.g., in Web information gathering) since duplications and ambiguities of data values (e.g., terms). Rough set based decision tables could be efficient tools for solving this challenge. This paper first illustrates the relationship between decision tables and association mining. It proves that a decision rule is a kind of closed pattern. It also presents an alternative concept of rough association rules to improve the quality of discovered knowledge in this area. The premise of a rough association rule consists of a set of terms (items) and a weight distribution of terms (items). The distinct advantage of rough association rules is that they contain more specific information than normal association rules. This paper also conducts some experiments to compare the proposed method with association rule mining and decision tables; and the experimental results verify that the proposed approach is promising.

This is a preview of subscription content, log in via an institution.

Buying options

Chapter
USD   29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD   39.99
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD   54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.

References

  1. Antonie, M.L., Zaiane, O.R.: Text document categorization by term association. In: 2nd IEEE International Conference on Data Mining, Japan, pp. 19–26. IEEE Computer Society Press, Los Alamitos (2002)

    Google Scholar 

  2. Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)

    Google Scholar 

  3. Chang, G., et al.: Mining the World Wide Web: an information search approach. Kluwer Academic Publishers, Dordrecht (2001)

    MATH  Google Scholar 

  4. Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology 3(1), 1–27 (2003)

    Article  Google Scholar 

  5. Evans, D.A., et al.: CLARIT experiments in batch filtering: term selection and threshold optimization in IR and SVM Filters. In: TREC02 (2002)

    Google Scholar 

  6. Fayyad, U., et al. (eds.): Advances in knowledge discovery and data mining. AAAI Press, Menlo Park (1996)

    Google Scholar 

  7. Feldman, R., Hirsh, H.: Mining associations in text in presence of background knowledge. In: 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 343–346. ACM Press, New York (1996)

    Google Scholar 

  8. Feldman, R., et al.: Maximal association rules: a new tool for mining for keyword co-occurrences in document collection. In: 3rd International conference on knowledge discovery (KDD), pp. 167–170 (1997)

    Google Scholar 

  9. Feldman, R., et al.: Text mining at the term level. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 65–73. Springer, Heidelberg (1998)

    Chapter  Google Scholar 

  10. Feldman, R., Dagen, I., Hirsh, H.: Mining text using keywords distributions. Journal of Intelligent Information Systems 10(3), 281–300 (1998)

    Article  Google Scholar 

  11. Grossman, D.A., Frieder, O.: Information retrieval algorithms and heuristics. Kluwer Academic Publishers, Boston (1998)

    MATH  Google Scholar 

  12. Guan, J.W., Bell, D.A., Liu, D.Y.: The rough set approach to association rules. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 529–532. IEEE Computer Society Press, Los Alamitos (2003)

    Chapter  Google Scholar 

  13. Han, J., Fu, Y.: Minino Multiple-level Association Rules in Large Databases. IEEE Trans. On Knowledge and Data Engineering 11(5), 798–805 (1999)

    Article  Google Scholar 

  14. Holt, J.D., Chung, S.M.: Multipass algorithms for mining association rules in text databases. Knowledge and Information Systems 3, 168–183 (2001)

    Article  MATH  Google Scholar 

  15. Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI, pp. 587–592 (2003)

    Google Scholar 

  16. Li, Y., Zhong, N.: Web mining model and its applications on information gathering. Knowledge-Based Systems 17, 207–217 (2004)

    Article  Google Scholar 

  17. Li, Y., Zhong, N.: Capturing evolving patterns for ontology-based. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 256–263. IEEE, Los Alamitos (2004)

    Google Scholar 

  18. Li, Y., Zhong, N.: Interpretations of association rules by granular computing. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 593–596. IEEE Computer Society Press, Los Alamitos (2003)

    Chapter  Google Scholar 

  19. Li, Y., Zhong, N.: Mining ontology for automatically acquiring Web user information needs. IEEE Transactions on Knowledge and Data Engineering 18(4), 554–568 (2006)

    Article  MathSciNet  Google Scholar 

  20. Liu, B., et al.: Building text classifiers using positive and unlabeled examples. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 179–186. IEEE Computer Society Press, Los Alamitos (2003)

    Chapter  Google Scholar 

  21. Liu, B., Ma, Y., Yu, P.S.: Discovering business intelligence information by comparing company Web sites. In: Zhong, N., Liu, J., Yao, Y.Y. (eds.) Web Intelligence, pp. 105–127. Springer, Heidelberg (2003)

    Google Scholar 

  22. Mostafa, J., Lam, W., Palakal, M.: A multilevel approach to intelligent information filtering: model, system, and evaluation. ACM Transactions on Information Systems 15(4), 368–399 (1997)

    Article  Google Scholar 

  23. Pawlak, Z.: In pursuit of patterns in data reasoning from data, the rough set way. In: Alpigini, J.J., et al. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 1–9. Springer, Heidelberg (2002)

    Chapter  Google Scholar 

  24. Pawlak, Z.: Flow graphs and decision algorithms. In: 9th International Conference on Rough Set, Fuzzy Sets, Data Mining and Granular Computing, Chongqing, China, pp. 1–10 (2003)

    Google Scholar 

  25. Robertson, S., Hull, D.A.: The TREC-9 filtering track final report. In: TREC-9 (2000)

    Google Scholar 

  26. Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)

    Article  Google Scholar 

  27. Tzvetkov, P., Yan, X., Han, J.: TSP: Mining top-K closed sequential patterns. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 347–354. IEEE Computer Society Press, Los Alamitos (2003)

    Chapter  Google Scholar 

  28. Wu, S.-T., et al.: Automatic pattern taxonomy exatraction for Web mining. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 242–248. IEEE, Los Alamitos (2004)

    Google Scholar 

  29. Yu, H., Han, J., Chang, K.: PEBL: positive example based learning for Web page classification using SVM. In: KDD02, pp. 239–248 (2002)

    Google Scholar 

  30. Webb, G.I., Zhang, S.: K-optimal rule discovery. Data Mining and Knowledge Discovery 10, 39–79 (2004)

    Article  MathSciNet  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Editor information

James F. Peters Andrzej Skowron Victor W. Marek Ewa Orłowska Roman Słowiński Wojciech Ziarko

Rights and permissions

Reprints and permissions

Copyright information

© 2007 Springer Berlin Heidelberg

About this chapter

Cite this chapter

Li, Y., Zhong, N. (2007). Mining Rough Association from Text Documents for Web Information Gathering. In: Peters, J.F., Skowron, A., Marek, V.W., Orłowska, E., Słowiński, R., Ziarko, W. (eds) Transactions on Rough Sets VII. Lecture Notes in Computer Science, vol 4400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71663-1_7

Download citation

  • DOI: https://doi.org/10.1007/978-3-540-71663-1_7

  • Publisher Name: Springer, Berlin, Heidelberg

  • Print ISBN: 978-3-540-71662-4

  • Online ISBN: 978-3-540-71663-1

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics