Abstract
It is a big challenge to guarantee the quality of association rules in some application areas (e.g., in Web information gathering) since duplications and ambiguities of data values (e.g., terms). Rough set based decision tables could be efficient tools for solving this challenge. This paper first illustrates the relationship between decision tables and association mining. It proves that a decision rule is a kind of closed pattern. It also presents an alternative concept of rough association rules to improve the quality of discovered knowledge in this area. The premise of a rough association rule consists of a set of terms (items) and a weight distribution of terms (items). The distinct advantage of rough association rules is that they contain more specific information than normal association rules. This paper also conducts some experiments to compare the proposed method with association rule mining and decision tables; and the experimental results verify that the proposed approach is promising.
This is a preview of subscription content, log in via an institution.
Buying options
Tax calculation will be finalised at checkout
Purchases are for personal use only
Learn about institutional subscriptionsPreview
Unable to display preview. Download preview PDF.
References
Antonie, M.L., Zaiane, O.R.: Text document categorization by term association. In: 2nd IEEE International Conference on Data Mining, Japan, pp. 19–26. IEEE Computer Society Press, Los Alamitos (2002)
Baeza-Yates, R., Ribeiro-Neto, B.: Modern Information Retrieval. Addison-Wesley, Reading (1999)
Chang, G., et al.: Mining the World Wide Web: an information search approach. Kluwer Academic Publishers, Dordrecht (2001)
Eirinaki, M., Vazirgiannis, M.: Web mining for web personalization. ACM Transactions on Internet Technology 3(1), 1–27 (2003)
Evans, D.A., et al.: CLARIT experiments in batch filtering: term selection and threshold optimization in IR and SVM Filters. In: TREC02 (2002)
Fayyad, U., et al. (eds.): Advances in knowledge discovery and data mining. AAAI Press, Menlo Park (1996)
Feldman, R., Hirsh, H.: Mining associations in text in presence of background knowledge. In: 2nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 343–346. ACM Press, New York (1996)
Feldman, R., et al.: Maximal association rules: a new tool for mining for keyword co-occurrences in document collection. In: 3rd International conference on knowledge discovery (KDD), pp. 167–170 (1997)
Feldman, R., et al.: Text mining at the term level. In: Żytkow, J.M. (ed.) PKDD 1998. LNCS, vol. 1510, pp. 65–73. Springer, Heidelberg (1998)
Feldman, R., Dagen, I., Hirsh, H.: Mining text using keywords distributions. Journal of Intelligent Information Systems 10(3), 281–300 (1998)
Grossman, D.A., Frieder, O.: Information retrieval algorithms and heuristics. Kluwer Academic Publishers, Boston (1998)
Guan, J.W., Bell, D.A., Liu, D.Y.: The rough set approach to association rules. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 529–532. IEEE Computer Society Press, Los Alamitos (2003)
Han, J., Fu, Y.: Minino Multiple-level Association Rules in Large Databases. IEEE Trans. On Knowledge and Data Engineering 11(5), 798–805 (1999)
Holt, J.D., Chung, S.M.: Multipass algorithms for mining association rules in text databases. Knowledge and Information Systems 3, 168–183 (2001)
Li, X., Liu, B.: Learning to classify texts using positive and unlabeled data. In: IJCAI, pp. 587–592 (2003)
Li, Y., Zhong, N.: Web mining model and its applications on information gathering. Knowledge-Based Systems 17, 207–217 (2004)
Li, Y., Zhong, N.: Capturing evolving patterns for ontology-based. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 256–263. IEEE, Los Alamitos (2004)
Li, Y., Zhong, N.: Interpretations of association rules by granular computing. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 593–596. IEEE Computer Society Press, Los Alamitos (2003)
Li, Y., Zhong, N.: Mining ontology for automatically acquiring Web user information needs. IEEE Transactions on Knowledge and Data Engineering 18(4), 554–568 (2006)
Liu, B., et al.: Building text classifiers using positive and unlabeled examples. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 179–186. IEEE Computer Society Press, Los Alamitos (2003)
Liu, B., Ma, Y., Yu, P.S.: Discovering business intelligence information by comparing company Web sites. In: Zhong, N., Liu, J., Yao, Y.Y. (eds.) Web Intelligence, pp. 105–127. Springer, Heidelberg (2003)
Mostafa, J., Lam, W., Palakal, M.: A multilevel approach to intelligent information filtering: model, system, and evaluation. ACM Transactions on Information Systems 15(4), 368–399 (1997)
Pawlak, Z.: In pursuit of patterns in data reasoning from data, the rough set way. In: Alpigini, J.J., et al. (eds.) RSCTC 2002. LNCS (LNAI), vol. 2475, pp. 1–9. Springer, Heidelberg (2002)
Pawlak, Z.: Flow graphs and decision algorithms. In: 9th International Conference on Rough Set, Fuzzy Sets, Data Mining and Granular Computing, Chongqing, China, pp. 1–10 (2003)
Robertson, S., Hull, D.A.: The TREC-9 filtering track final report. In: TREC-9 (2000)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Tzvetkov, P., Yan, X., Han, J.: TSP: Mining top-K closed sequential patterns. In: 3rd IEEE International Conference on Data Mining, Melbourne, Florida, USA, pp. 347–354. IEEE Computer Society Press, Los Alamitos (2003)
Wu, S.-T., et al.: Automatic pattern taxonomy exatraction for Web mining. In: IEEE/WIC/ACM International Conference on Web Intelligence, Beijing, China, pp. 242–248. IEEE, Los Alamitos (2004)
Yu, H., Han, J., Chang, K.: PEBL: positive example based learning for Web page classification using SVM. In: KDD02, pp. 239–248 (2002)
Webb, G.I., Zhang, S.: K-optimal rule discovery. Data Mining and Knowledge Discovery 10, 39–79 (2004)
Author information
Authors and Affiliations
Editor information
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this chapter
Cite this chapter
Li, Y., Zhong, N. (2007). Mining Rough Association from Text Documents for Web Information Gathering. In: Peters, J.F., Skowron, A., Marek, V.W., Orłowska, E., Słowiński, R., Ziarko, W. (eds) Transactions on Rough Sets VII. Lecture Notes in Computer Science, vol 4400. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-71663-1_7
Download citation
DOI: https://doi.org/10.1007/978-3-540-71663-1_7
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-71662-4
Online ISBN: 978-3-540-71663-1
eBook Packages: Computer ScienceComputer Science (R0)