Mining Rough Association from Text Documents for Web Information Gathering

Li, Yuefeng; Zhong, Ning

doi:10.1007/978-3-540-71663-1_7

Mining Rough Association from Text Documents for Web Information Gathering

Yuefeng Li¹ &
Ning Zhong²

Chapter

517 Accesses
2 Citations

Part of the book series: Lecture Notes in Computer Science ((TRS,volume 4400))

Abstract

It is a big challenge to guarantee the quality of association rules in some application areas (e.g., in Web information gathering) since duplications and ambiguities of data values (e.g., terms). Rough set based decision tables could be efficient tools for solving this challenge. This paper first illustrates the relationship between decision tables and association mining. It proves that a decision rule is a kind of closed pattern. It also presents an alternative concept of rough association rules to improve the quality of discovered knowledge in this area. The premise of a rough association rule consists of a set of terms (items) and a weight distribution of terms (items). The distinct advantage of rough association rules is that they contain more specific information than normal association rules. This paper also conducts some experiments to compare the proposed method with association rule mining and decision tables; and the experimental results verify that the proposed approach is promising.

This is a preview of subscription content, log in via an institution.

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Learn about institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

References

Antonie, M.L., Zaiane, O.R.: Text document categorization by term association. In: 2nd IEEE International Conference on Data Mining, Japan, pp. 19–26. IEEE Computer Society Press, Los Alamitos (2002)
Google Scholar
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Google Scholar
Chang, G., et al.: Mining the World Wide Web: an information search approach. Kluwer Academic Publishers, Dordrecht (2001)
MATH Google Scholar
Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology 3(1), 1–27 (2003)
Article Google Scholar
Evans, D.A., et al.: CLARIT experiments in batch filtering: term selection and threshold optimization in IR and SVM Filters. In: TREC02 (2002)
Google Scholar
Fayyad, U., et al. (eds.): Advances in knowledge discovery and data mining. AAAI Press, Menlo Park (1996)
Google Scholar
Feldman, R., Hirsh, H.: Mining associations in text in presence of background knowledge. In: 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 343–346. ACM Press, New York (1996)
Google Scholar
Feldman, R., et al.: Maximal association rules: a new tool for mining for keyword co-occurrences in document collection. In: 3rd International conference on knowledge discovery (KDD), pp. 167–170 (1997)
Google Scholar
Feldman, R., et al.: Text mining at the term level. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 65–73. Springer, Heidelberg (1998)
Chapter Google Scholar
Feldman, R., Dagen, I., Hirsh, H.: Mining text using keywords distributions. Journal of Intelligent Information Systems 10(3), 281–300 (1998)
Article Google Scholar
Grossman, D.A., Frieder, O.: Information retrieval algorithms and heuristics. Kluwer Academic Publishers, Boston (1998)
MATH Google Scholar
Guan, J.W., Bell, D.A., Liu, D.Y.: The rough set approach to association rules. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 529–532. IEEE Computer Society Press, Los Alamitos (2003)
Chapter Google Scholar
Han, J., Fu, Y.: Minino Multiple-level Association Rules in Large Databases. IEEE Trans. On Knowledge and Data Engineering 11(5), 798–805 (1999)
Article Google Scholar
Holt, J.D., Chung, S.M.: Multipass algorithms for mining association rules in text databases. Knowledge and Information Systems 3, 168–183 (2001)
Article MATH Google Scholar
Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI, pp. 587–592 (2003)
Google Scholar
Li, Y., Zhong, N.: Web mining model and its applications on information gathering. Knowledge-Based Systems 17, 207–217 (2004)
Article Google Scholar
Li, Y., Zhong, N.: Capturing evolving patterns for ontology-based. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 256–263. IEEE, Los Alamitos (2004)
Google Scholar
Li, Y., Zhong, N.: Interpretations of association rules by granular computing. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 593–596. IEEE Computer Society Press, Los Alamitos (2003)
Chapter Google Scholar
Li, Y., Zhong, N.: Mining ontology for automatically acquiring Web user information needs. IEEE Transactions on Knowledge and Data Engineering 18(4), 554–568 (2006)
Article MathSciNet Google Scholar
Liu, B., et al.: Building text classifiers using positive and unlabeled examples. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 179–186. IEEE Computer Society Press, Los Alamitos (2003)
Chapter Google Scholar
Liu, B., Ma, Y., Yu, P.S.: Discovering business intelligence information by comparing company Web sites. In: Zhong, N., Liu, J., Yao, Y.Y. (eds.) Web Intelligence, pp. 105–127. Springer, Heidelberg (2003)
Google Scholar
Mostafa, J., Lam, W., Palakal, M.: A multilevel approach to intelligent information filtering: model, system, and evaluation. ACM Transactions on Information Systems 15(4), 368–399 (1997)
Article Google Scholar
Pawlak, Z.: In pursuit of patterns in data reasoning from data, the rough set way. In: Alpigini, J.J., et al. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 1–9. Springer, Heidelberg (2002)
Chapter Google Scholar
Pawlak, Z.: Flow graphs and decision algorithms. In: 9th International Conference on Rough Set, Fuzzy Sets, Data Mining and Granular Computing, Chongqing, China, pp. 1–10 (2003)
Google Scholar
Robertson, S., Hull, D.A.: The TREC-9 filtering track final report. In: TREC-9 (2000)
Google Scholar
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Article Google Scholar
Tzvetkov, P., Yan, X., Han, J.: TSP: Mining top-K closed sequential patterns. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 347–354. IEEE Computer Society Press, Los Alamitos (2003)
Chapter Google Scholar
Wu, S.-T., et al.: Automatic pattern taxonomy exatraction for Web mining. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 242–248. IEEE, Los Alamitos (2004)
Google Scholar
Yu, H., Han, J., Chang, K.: PEBL: positive example based learning for Web page classification using SVM. In: KDD02, pp. 239–248 (2002)
Google Scholar
Webb, G.I., Zhang, S.: K-optimal rule discovery. Data Mining and Knowledge Discovery 10, 39–79 (2004)
Article MathSciNet Google Scholar

Download references

Author information

Authors and Affiliations

School of Software Engineering and Data Communications, Queensland University of Technology, Brisbane QLD 4001, Australia
Yuefeng Li
Department of Information Engineering, Maebashi Institute of Technology, Maebashi 371-0816, Japan
Ning Zhong

Authors

Yuefeng Li
View author publications
You can also search for this author in PubMed Google Scholar
Ning Zhong
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

James F. Peters Andrzej Skowron Victor W. Marek Ewa Orłowska Roman Słowiński Wojciech Ziarko

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Li, Y., Zhong, N. (2007). Mining Rough Association from Text Documents for Web Information Gathering. In: Peters, J.F., Skowron, A., Marek, V.W., Orłowska, E., Słowiński, R., Ziarko, W. (eds) Transactions on Rough Sets VII. Lecture Notes in Computer Science, vol 4400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71663-1_7

Download citation

DOI: https://doi.org/10.1007/978-3-540-71663-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71662-4
Online ISBN: 978-3-540-71663-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics