Clustering Deep Web Databases Semantically

Song, Ling; Ma, Jun; Yan, Po; Lian, Li; Zhang, Dongmei

doi:10.1007/978-3-540-68636-1_35

Ling Song^1,2,
Jun Ma¹,
Po Yan¹,
Li Lian¹ &
…
Dongmei Zhang^1,2

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 4993))

Included in the following conference series:

Asia Information Retrieval Symposium

Abstract

Deep Web database clustering is a key operation in organizing Deep Web resources. Cosine similarity in Vector Space Model (VSM) is used as the similarity computation in traditional ways. However it cannot denote the semantic similarity between the contents of two databases. In this paper how to cluster Deep Web databases semantically is discussed. Firstly, a fuzzy semantic measure, which integrates ontology and fuzzy set theory to compute semantic similarity between the visible features of two Deep Web forms, is proposed, and then a hybrid Particle Swarm Optimization (PSO) algorithm is provided for Deep Web databases clustering. Finally the clustering results are evaluated according to Average Similarity of Document to the Cluster Centroid (ASDC) and Rand Index (RI). Experiments show that: 1) the hybrid PSO approach has the higher ASDC values than those based on PSO and K-Means approaches. It means the hybrid PSO approach has the higher intra cluster similarity and lowest inter cluster similarity; 2) the clustering results based on fuzzy semantic similarity have higher ASDC values and higher RI values than those based on cosine similarity. It reflects the conclusion that the fuzzy semantic similarity approach can explore latent semantics.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Institutional subscriptions

Preview

Unable to display preview. Download preview PDF.

Towards a Term Clustering Framework for Modular Ontology Learning

An Approach for Semantic Web Discovery Using Unsupervised Learning Algorithms

Feature Representation Based on Improved Word-Vector Clustering Using AP and E2LSH

References

Hedley, Y.-L., Younas, M., James, A.: The categorisation of hidden web databases through concept specificity and coverage. In: Advanced Information Networking and Applications, 2005. 19th International Conference on AINA 2005, March 28-30, 2005, vol. 2(2), pp. 671–676 (2005)
Google Scholar
Peng, Q., Meng, W., He, H., Yu, C.T.: WISE-cluster: clustering e-commerce search engines automatically. In: Proceedings of the 6th ACM International Workshop on Web Information and Data Management, Washington, pp. 104–111 (2004)
Google Scholar
He, B., Tao, T., Chang, K.C.-C.: Organizing structured web sources by query schemas: a clustering approach. In: CIKM, pp. 22–31 (2004)
Google Scholar
Manning, C.D., Raghavan, P., Schütze, H.: An Introduction to Information Retrieval. Cambridge University Press, Cambridge (2006)
Google Scholar
Cui, X., Potok, T.E., Palathingal, P.: Object Clustering using Particle Swarm Optimization. In: Proceedings of the 2005 IEEE Swarm Intelligence Symposium, Pasadena, California, USA, June 2005, pp. 185–191 (2005)
Google Scholar
Shan, S.M., Deng, G.S., He, Y.H.: Data Clustering using Hybridization of Clustering Based on Grid and Density with PSO. In: IEEE International Conference on Service Operations and Logistics, and Informatics, Shanghai, June 2006, pp. 868–872 (2006)
Google Scholar
Van der Merwe, D.W., Engelbrecht, A.P.: Data Clustering using Particle Swarm Optimization. In: The 2003 Congress on Evolutionary Computation, vol. 1, pp. 215–220 (2003)
Google Scholar
Srinoy, S., Kurutach, W.: Combination Artificial Ant Clustering and K-PSO Clustering Approach to Network Security Model. In: ICHIT 2006. International Conference on Hybrid Information Technology, Cheju Island, Korea, vol. 2, pp. 128–134 (2006)
Google Scholar
Chen, C.-Y., Ye, F.: Particle Swarm Optimization Algorithm and Its Application to Clustering Analysis. In: Proceedings of the 2004 IEEE international Conference on Networking, Sensing Control, Taipei, Taiwan, March 2004, vol. 2, pp. 789–794 (2004)
Google Scholar
http://www.11thhourvacations.com
Halevy, A.Y.: Why your data don’t mix. ACM Queue 3(8) (2005)
Google Scholar
Ru, Y., Horowitz, E.: Indexing the invisibleWeb: a survey. Online Information Review 29(3), 249–265 (2005)
Article Google Scholar
Caverlee, J., Liu, L., Buttler, D.: Probe, Cluster, and Discover:Focused Extraction of QA-Pagelets from the Deep Web
Google Scholar
Barbosa, L., Freire, J., Silva, A.: Organizing hidden-web databases by clustering visible web documents. In: Data Engineering, 2007. IEEE 23rd International Conference on ICDE 2007, April 15-20, 2007, pp. 326–335 (2007)
Google Scholar
Bloehdorn, S., Cimiano, P., Hotho, A.: Learning Ontologies to Improve Text Clustering and Classification. In: Data and Information Analysis to Knowledge Engineering, pp. 334–341. Springer, Heidelberg (2006)
Chapter Google Scholar
Castells, P., Fernańdez, M., Vallet, D.: An Adaptation of the Vector-Space Model for Ontology-Based Information Retrieval. IEEE Transactions on Knowledge and Data Engineering 19(2), 261–272 (2007)
Article Google Scholar
Shamsfard, M., Nematzadeh, A., Motiee, S.: ORank: An Ontology Based System for Ranking Objects. International Journal Of Computer Science 1(3), 1306–4428 (2006)
Google Scholar
Varelas, G., Voutsakis, E., Raftopoulou, P.: Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web. In: Proceedings of the 7th annual ACM international workshop on Web information and data management, Bremen, Germany, pp. 10–16 (2005)
Google Scholar
Zhang, X., Jing, L., Hu, X., Ng, M., Zhou, X.: A Comparative Study of Ontology Based Term Similarity Measures on PubMed Object Clustering, http://www.pages.drexel.edu/~xz38/pdf/209_Zhang_DASFAA07.pdf
Chaudhri, V.K., Farquhar, A., Fikes, R., Karp, P.D., Rice, J.P.: OKBC: A Progammatic Foundation for Knowledge Base Interoperability. In: Proceedings of the fifteenth national/tenth conference on Artificial intelligence/Innovative applications of artificial intelligence, Madison, Wisconsin, United States, pp. 600–607 (1998)
Google Scholar
http://iew3.technion.ac.il/OntoBuilder
http://protege.stanford.edu
Zadeh, L.A.: Similarity Relations and Fuzzy Orderings. Information Science 3, 177–200 (1971)
Article MATH MathSciNet Google Scholar
Thomopoulos, R., Buche, P., Haemmerle, O.: Fuzzy Sets Defined on a Hierarchical Domain. IEEE Transaction on knowledge and engineering 18(10), 1397–1410 (2006)
Article Google Scholar
Zadeh, L.A.: Fuzzy sets as a basis for a theory of possibility. Fuzzy Sets and Systems 100(supp. 1), 9–34 (1978)
Google Scholar
Brucker, P.: On the complexity of clustering problems. In: Beckmenn, M., Kunzi, H.P. (eds.) Optimization and Operations Research. Lecture Notes in Economics and Malhemorical Sysrem, vol. lS7, pp. 45–54. Springer, Berlin (1978)
Google Scholar
http://metaquerier.cs.uiuc.edu/repository/datasets/tel-8/index.html
http://aip.completeplanet.com
http://www.invisible-web.net

Download references

Author information

Authors and Affiliations

School of Computer Science &Technology, Shandong University, 250061, China
Ling Song, Jun Ma, Po Yan, Li Lian & Dongmei Zhang
School of Computer Science & Technology, Shandong Jianzhu University, 250101, China
Ling Song & Dongmei Zhang

Authors

Ling Song
View author publications
You can also search for this author in PubMed Google Scholar
Jun Ma
View author publications
You can also search for this author in PubMed Google Scholar
Po Yan
View author publications
You can also search for this author in PubMed Google Scholar
Li Lian
View author publications
You can also search for this author in PubMed Google Scholar
Dongmei Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Editor information

Hang Li Ting Liu Wei-Ying Ma Tetsuya Sakai Kam-Fai Wong Guodong Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Song, L., Ma, J., Yan, P., Lian, L., Zhang, D. (2008). Clustering Deep Web Databases Semantically. In: Li, H., Liu, T., Ma, WY., Sakai, T., Wong, KF., Zhou, G. (eds) Information Retrieval Technology. AIRS 2008. Lecture Notes in Computer Science, vol 4993. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-68636-1_35

Download citation

DOI: https://doi.org/10.1007/978-3-540-68636-1_35
Publisher Name: Springer, Berlin, Heidelberg
Print ISBN: 978-3-540-68633-0
Online ISBN: 978-3-540-68636-1
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics