Abstract
This paper presents a topic discovery approach based on multi-ant colonies clustering combination. The algorithm consists of three parts. First, each document is represented as a vector of features in a vector space model. Then a hypergraph model is used to combine the clusterings produced by three kinds of ant-based algorithms with different moving speed. Finally, the topic of each cluster is extracted by re-computing the term weights. Test results show that the number of topics can be adaptively determined and clustering combination can improve the system performance.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Berkhin, P.: Survey of Clustering Data Mining Techniques. Accrue Software Research Paper (2002) [Online], Available http://www.accrue.com/products/researchpapers.htm
Deneubourg, J.L., Goss, S., Franks, N., Sendova-Franks, A., Detrain, C., Chretien, L.: The Dynamics of Collective Sorting: Robot-like Ant and Ant-like Robot. In: Meyer, J.A., Wilson, S.W. (eds.) Proc. First Conference on Simulation of Adaptive Behavior: From Animals to Animats, pp. 356–365. MIT Press, Cambridge (1991)
Lumer, E., Faieta, B.: Diversity and Adaptation in Populations of Clustering Ants. In: Proc. Third International Conference on Simulation of Adaptive Behavior: From Animals to Animats, vol. 3, pp. 499–508. MIT Press, Cambridge (1994)
Ramos, V., Merelo, J.J.: Self-organized Stigmergic Document Maps: Environment as a Mechanism for Context Learning. In: Alba, E., Herrera, F., Merelo, J.J. (eds.) AEB 2002 – 1st Spanish Conference on Evolutionary and Bio-Inspired Algorithms, Centro Univ. de Mérida, Mérida, Spain, pp. 284–293 (2002)
Monmarché, N., Slimane, M., Venturini, G.: Antclass: Discovery of Clusters in Numeric Data by a Hybridization of an Ant Colony with the Kmeans Algorithm. Internal report No. 213, Laboratoire d’Informatique de l’Université de Tours, E3i Tours [Online], Available http://www.antsearch.univ-tours.fr/publi/MonSliVen99b.pdf
Wu, B., Zheng, Y., Liu, S., Shi, Z.: CSIM: a Document Clustering Algorithm Based on Swarm Intelligence. IEEE World Congress on Computational Intelligence, 477–482 (2002)
Yang, Y., Kamel, M.: Clustering Ensemble Using Swarm Intelligence. In: IEEE Swarm Intelligence Symposium, pp. 65–71 (2003)
Strehl, A., Ghosh, J.: Cluster Ensembles – a Knowledge Reuse Framework for Combining Partitionings. In: Proc. of AAAI, Edmonton, Canada, pp. 93–98. AAAI/MIT Press, Cambridge (2002)
Ayad, H., Kamel, M.: Topic Discovery from Text Using Aggregation of Different Clustering Methods. In: Cohen, R., Spencer, B. (eds.) Advances in Artificial Intelligence, 15th Conference of the Canadian Society for Computational Studies of Intelligence, Calgary, Canada, pp. 161–175 (2002)
Wu, K.J., Chen, M.C., Sun, Y.: Automatic Topics Discovery from Hyperlinked Documents. Information Processing and Management 40, 239–255 (2004)
Porter, M.F.: An Algorithm for Suffix Stripping. Program 14(3), 130–137 (1980)
Salton, G., Wong, A., Yang, C.: A Vector Space Model for Automatic Indexing. Communications of the ACM 18(11), 613–620 (1975)
Salton, G., Buckley, C.: Term Weighting Approaches in Automatic Text Retrieval. Information processing and Management 24(5), 513–523 (1988)
Larsen, B., Aone, C.: Fast and Effective Text Mining Using Linear-time Document Clustering. In: Chaudhuri, S., Madigan, D. (eds.) Proc. of fifth ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, pp. 16–22 (1999)
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2005 Springer-Verlag Berlin Heidelberg
About this paper
Cite this paper
Yang, Y., Kamel, M., Jin, F. (2005). Topic Discovery from Document Using Ant-Based Clustering Combination. In: Zhang, Y., Tanaka, K., Yu, J.X., Wang, S., Li, M. (eds) Web Technologies Research and Development - APWeb 2005. APWeb 2005. Lecture Notes in Computer Science, vol 3399. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-31849-1_11
Download citation
DOI: https://doi.org/10.1007/978-3-540-31849-1_11
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-25207-8
Online ISBN: 978-3-540-31849-1
eBook Packages: Computer ScienceComputer Science (R0)