Abstract
Clustering is one of the most important approaches for mining and extracting knowledge from the web. In this paper a method for clustering the web data is presented which using a Bayesian network, finds appropriate representatives for each of the clusters. Having those representatives, we can create more accurate clusters. Also the contents of the web pages are converted into vectors which firstly, the number of dimensions is reduced, and secondly the orthogonality problem is solved. Experimental results show about the high quality of the resultant clusters.
This paper is supported by Iran Telecommunication Research Center (ITRC).
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Bloehdorn, S., Hotho, A.: Text classification by boosting weak learners based on terms and concepts. In: Fourth IEEE International Conference on Data Mining (2004)
Getoor, L.: Link Mining: A New Data Mining Challenge. ACM SIGKDD Explorations Newsletter 5(1), 84–89 (2003)
Grira, N., Crucianu, M., Boujemaa, N.: Unsupervised and Semi-supervised Clustering: a Brief Survey. In: ACM SIGMM workshop on Multimedia information retrieval, pp. 9–16 (2005)
Gruber, T.R.: Towards Principles for the Design of Ontologies Used for Knowledge Sharing. In: Formal Ontology in Conceptual Analysis and Knowledge Representation, Netherlands (1993)
Hotho, A., Staab, S., Stumme, G.: Explaining text clustering results using semantic structures. In: Lavrač, N., Gamberger, D., Todorovski, L., Blockeel, H. (eds.) PKDD 2003. LNCS (LNAI), vol. 2838, pp. 217–228. Springer, Heidelberg (2003)
Koller, D., Sahami, M.: Hierarchically classifying documents using very few words. In: 14th International Conference on Machine Learning (ML), Tennessee, pp. 170–178 (1997)
Larsen, B., Aone, C.: Fast and effective text mining using linear-time document clustering. In: Proceedings of SIGKDD’99, CA, pp. 16–22 (1999)
McQueen, J.: Some methods for classification and analysis of multivariate observations. In: Fifth Berkeley Symposium on Mathematical Statistics and Probability, pp. 281–297 (1967)
Mitchell, T.M.: Machine Learning. Ch. 6. McGraw-Hill, New York (1995)
Sebastiani, F.: Machine learning in automated text categorization. ACM Computing Surveys 34(1), 1–47 (2002)
Stumme, G., Hotho, A., Berendt, B.: Semantic Web Mining State of the art and future directions. Journal of Web Semantics: Science, Services and Agents on the World Wide Web 4(2), 124–143 (2006)
Flach, P.A., De Raedt, L. (eds.): ECML 2001/PKDD 2001. LNCS (LNAI), vol. 2167/2168. Springer, Heidelberg (2001)
Wang, Y., Kitsuregawa, M.: Link Based Clustering of Web Search Results. In: Wang, X.S., Yu, G., Lu, H. (eds.) WAIM 2001. LNCS, vol. 2118, pp. 225–236. Springer, Heidelberg (2001)
Witten, I.H., Frank, E.: Data Mining, Practical Machine Learning Tools and Techniques, 2nd edn. ch. 6. Morgan Kaufmann, San Francisco, CA (2000)
Xu, R., Wunsch, D.: Survey of Clustering Algorithms. IEEE Trans. On Neural Networks 16(3) (2005)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2007 Springer Berlin Heidelberg
About this paper
Cite this paper
Chehreghani, M.H., Abolhassani, H. (2007). H-BayesClust: A New Hierarchical Clustering Based on Bayesian Networks. In: Alhajj, R., Gao, H., Li, J., Li, X., Zaïane, O.R. (eds) Advanced Data Mining and Applications. ADMA 2007. Lecture Notes in Computer Science(), vol 4632. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-73871-8_59
Download citation
DOI: https://doi.org/10.1007/978-3-540-73871-8_59
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-73870-1
Online ISBN: 978-3-540-73871-8
eBook Packages: Computer ScienceComputer Science (R0)